Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
311 |
Adaptive modelling and planning for learning intelligent behaviourKochenderfer, Mykel J. January 2006 (has links)
An intelligent agent must be capable of using its past experience to develop an understanding of how its actions affect the world in which it is situated. Given some objective, the agent must be able to effectively use its understanding of the world to produce a plan that is robust to the uncertainty present in the world. This thesis presents a novel computational framework called the Adaptive Modelling and Planning System (AMPS) that aims to meet these requirements for intelligence. The challenge of the agent is to use its experience in the world to generate a model. In problems with large state and action spaces, the agent can generalise from limited experience by grouping together similar states and actions, effectively partitioning the state and action spaces into finite sets of regions. This process is called abstraction. Several different abstraction approaches have been proposed in the literature, but the existing algorithms have many limitations. They generally only increase resolution, require a large amount of data before changing the abstraction, do not generalise over actions, and are computationally expensive. AMPS aims to solve these problems using a new kind of approach. AMPS splits and merges existing regions in its abstraction according to a set of heuristics. The system introduces splits using a mechanism related to supervised learning and is defined in a general way, allowing AMPS to leverage a wide variety of representations. The system merges existing regions when an analysis of the current plan indicates that doing so could be useful. Because several different regions may require revision at any given time, AMPS prioritises revision to best utilise whatever computational resources are available. Changes in the abstraction lead to changes in the model, requiring changes to the plan. AMPS prioritises the planning process, and when the agent has time, it replans in high-priority regions. This thesis demonstrates the flexibility and strength of this approach in learning intelligent behaviour from limited experience.
|
312 |
Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worldsCrook, Paul A. January 2007 (has links)
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l).
|
313 |
Towards Controlling Latency in Wireless NetworksBouacida, Nader 24 April 2017 (has links)
Wireless networks are undergoing an unprecedented revolution in the last decade. With the explosion of delay-sensitive applications in the Internet (i.e., online gaming and VoIP), latency becomes a major issue for the development of wireless technology. Taking advantage of the significant decline in memory prices, industrialists equip the network devices with larger buffering capacities to improve the network throughput by limiting packets drops. Over-buffering results in increasing the time that packets spend in the queues and, thus, introducing more latency in networks. This phenomenon is known as “bufferbloat”. While throughput is the dominant performance metric, latency also has a huge impact on user experience not only for real-time applications but also for common applications like web browsing, which is sensitive to latencies in order of hundreds of milliseconds.
Concerns have arisen about designing sophisticated queue management schemes to mitigate the effects of such phenomenon. My thesis research aims to solve bufferbloat problem in both traditional half-duplex and cutting-edge full-duplex wireless systems by reducing delay while maximizing wireless links utilization and fairness. Our work shed lights on buffer management algorithms behavior in wireless networks and their ability to reduce latency resulting from excessive queuing delays inside oversized static network buffers without a significant loss in other network metrics.
First of all, we address the problem of buffer management in wireless full-duplex networks by using Wireless Queue Management (WQM), which is an active queue management technique for wireless networks. Our solution is based on Relay Full-Duplex MAC (RFD-MAC), an asynchronous media access control protocol designed for relay full-duplexing. Compared to the default case, our solution reduces the end-to-end delay by two orders of magnitude while achieving similar throughput in most of the cases.
In the second part of this thesis, we propose a novel design called “LearnQueue” based on reinforcement learning that can effectively control the latency in wireless networks. LearnQueue adapts quickly and intelligently to changes in the wireless environment using a sophisticated reward structure. Testbed results prove that LearnQueue can guarantee low latency while preserving throughput.
|
314 |
An Evaluation of Negative Reinforcement During Error Correction ProceduresMaillard, Gloria Nicole 12 1900 (has links)
This study evaluated the effects of error correction procedures on sight word acquisition. Participants were four typically developing children in kindergarten and first grade. We used an adapted alternating treatment design embedded within a multiple baseline design to evaluate instructional efficacy of two error correction procedures; one with preferred items plus error correction and one with error correction only, and a concurrent chain schedule to evaluate participant preference for instructional procedure. The results show that there was no difference in acquisition rates between the procedures. The evaluation also showed children prefer procedures that include a positive reinforcement component.
|
315 |
Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot ControlOndroušek, Vít January 2011 (has links)
The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.
|
316 |
Evolutionary reinforcement learning of spoken dialogue strategiesToney, Dave January 2007 (has links)
From a system developer's perspective, designing a spoken dialogue system can be a time-consuming and difficult process. A developer may spend a lot of time anticipating how a potential user might interact with the system and then deciding on the most appropriate system response. These decisions are encoded in a dialogue strategy, essentially a mapping between anticipated user inputs and appropriate system outputs. To reduce the time and effort associated with developing a dialogue strategy, recent work has concentrated on modelling the development of a dialogue strategy as a sequential decision problem. Using this model, reinforcement learning algorithms have been employed to generate dialogue strategies automatically. These algorithms learn strategies by interacting with simulated users. Some progress has been made with this method but a number of important challenges remain. For instance, relatively little success has been achieved with the large state representations that are typical of real-life systems. Another crucial issue is the time and effort associated with the creation of simulated users. In this thesis, I propose an alternative to existing reinforcement learning methods of dialogue strategy development. More specifically, I explore how XCS, an evolutionary reinforcement learning algorithm, can be used to find dialogue strategies that cover large state spaces. Furthermore, I suggest that hand-coded simulated users are sufficient for the learning of useful dialogue strategies. I argue that the use of evolutionary reinforcement learning and hand-coded simulated users is an effective approach to the rapid development of spoken dialogue strategies. Finally, I substantiate this claim by evaluating a learned strategy with real users. Both the learned strategy and a state-of-the-art hand-coded strategy were integrated into an end-to-end spoken dialogue system. The dialogue system allowed real users to make flight enquiries using a live database for an Edinburgh-based airline. The performance of the learned and hand-coded strategies were compared. The evaluation results show that the learned strategy performs as well as the hand-coded one (81% and 77% task completion respectively) but takes much less time to design (two days instead of two weeks). Moreover, the learned strategy compares favourably with previous user evaluations of learned strategies.
|
317 |
[en] USING REINFORCEMENT LEARNING ON WEB PAGES REVISITING PROBLEM / [pt] APRENDIZADO POR REFORÇO SOBRE O PROBLEMA DE REVISITAÇÃO DE PÁGINAS WEBEUGENIO PACELLI FERREIRA DIAS JUNIOR 14 June 2012 (has links)
[pt] No ambiente da Internet, as informações que desejamos frequentemente encontram-se em diferentes localidades. Algumas aplicações, para funcionarem corretamente, precisam manter cópias locais de parte dessas informações. Manter a consistência e a atualidade de uma base de dados, mais especificamente um conjunto de cópias de páginas web, é uma tarefa que vem sendo sistematicamente estudada. Uma abordagem possível a esse problema é a aplicação de técnicas de aprendizado por reforço, que utiliza técnicas de programação dinâmica e análise estocástica para obter uma boa política de agendamento de atualizações das cópias de páginas web. O presente trabalho tem por finalidade validar o uso de técnicas de aprendizado por reforço no problema em questão, assim como encontrar aspectos do problema que possam ser úteis na modelagem da solução empregada. / [en] In the Internet, the information we desire is usually spread over different locations. For some applications, it is necessary to maintain local copies of this information. Keeping consistency as well as freshness of a data base, or more specifically a set of internet web pages, is a task systematically studied. An approach to this problem is the use of reinforcement learning techniques, using dynamic programming and stochastic analysis to obtain a good rescheduling policy for the web pages copies. This work is proposed to validate the use of reinforcement learning techniques over this problem, as well as finding features of the problem useful to model the developed solution.
|
318 |
Safety verification of model based reinforcement learning controllers using reachability analysisAkshita Gupta (7047728) 13 August 2019 (has links)
<div>Reinforcement Learning (RL) is a data-driven technique which is finding increasing application in the development of controllers for sequential decision making problems. Their wide adoption can be attributed to the fact that the development of these controllers is independent of the</div><div>knowledge of the system and thus can be used even when the environment dynamics are unknown. Model-Based RL controllers explicitly model the system dynamics from the observed (training) data using a function approximator, followed by using a path planning algorithm to obtain the optimal control sequence. While these controllers have been proven to be successful in simulations, lack of strong safety guarantees in the presence of noise makes them ill-posed for deployment on hardware, specially in safety critical systems. The proposed work aims at bridging this gap by providing a verification framework to evaluate the safety guarantees for a Model-Based RL controller. Our method builds upon reachability analysis to determine if there is any action which can drive the system into a constrained (unsafe) region. Consequently, our method can provide a binary yes or no answer to whether all the initial set of states are (un)safe to propagate trajectories from in the presence of some bounded noise.</div>
|
319 |
Using Deep Reinforcement Learning For Adaptive Traffic Control in Four-Way IntersectionsJörneskog, Gustav, Kandelan, Josef January 2019 (has links)
The consequences of traffic congestion include increased travel time, fuel consumption, and the number of crashes. Studies suggest that most traffic delays are due to nonrecurring traffic congestion. Adaptive traffic control using real-time data is effective in dealing with nonrecurring traffic congestion. Many adaptive traffic control algorithms used today are deterministic and prone to human error and limitation. Reinforcement learning allows the development of an optimal traffic control policy in an unsupervised manner. We have implemented a reinforcement learning algorithm that only requires information about the number of vehicles and the mean speed of each incoming road to streamline traffic in a four-way intersection. The reinforcement learning algorithm is evaluated against a deterministic algorithm and a fixed-time control schedule. Furthermore, it was tested whether reinforcement learning can be trained to prioritize emergency vehicles while maintaining good traffic flow. The reinforcement learning algorithm obtains a lower average time in the system than the deterministic algorithm in eight out of nine experiments. Moreover, the reinforcement learning algorithm achieves a lower average time in the system than the fixed-time schedule in all experiments. At best, the reinforcement learning algorithm performs 13% better than the deterministic algorithm and 39% better than the fixed-time schedule. Moreover, the reinforcement learning algorithm could prioritize emergency vehicles while maintaining good traffic flow.
|
320 |
Aprendizado por reforço relacional para o controle de robôs sociáveis / Relational reinforcement learning to control sociable robotsSilva, Renato Ramos da 10 March 2009 (has links)
A inteligência artificial não busca somente entender mas construir entidades inteligentes. A inteligência pode ser dividida em vários fatores e um deles é conhecido como aprendizado. A área de aprendizado de máquina visa o desenvolvimento de técnicas para aprendizado automático de máquinas, que incluem computadores, robôs ou qualquer outro dispositivo. Entre essas técnicas encontra-se o Aprendizado por Reforço, foco principal deste trabalho. Mais especificamente, o aprendizado por reforço relacional (ARR) foi investigado, que representa na forma relacional o aprendizado obtido através da interação direta com o ambiente. O ARR é bem interessante no campo de robótica, pois, em geral, não se dispôe do modelo do ambiente e se requer econômia de recursos utilizados. A técnica ARR foi investigada dentro do contexto de aprendizado de uma cabeça robótica. Uma modificação no algoritmo ARR foi proposta, denominada por ETG, e incorporada em uma arquitetura de controle de uma cabeça robótica. A arquitetura foi avaliada no contexto de um problema real não trivial: o aprendizado da atenção compartilhada. Os resultados obtidos mostram que a arquitetura é capaz de exibir comportamentos apropriados durante uma interação social controlada, através da utilização do ETG. Uma análise comparativa com outros métodos foi realizada que mostram que o algoritmo proposto conseguiu obter um desempenho superior na maioria dos experimentos realizados / The artificial Intelligence search not only understand but to build intelligent entities. The intelligence can be divided into several factors and one of them is known as learning. The area of machine learning aimed at the development techniques for automatic learning of machinery, including computers, robots or any other device. Reinforcement Learning is one of those techniques, main focus of this work. Specifically, the relational reinforcement learning was investigated, which is use relational representation for learning obtained through direct interaction with the environment. The relational reinforcement learning is quite interesting in the field of robotics, because, in general, it does not have the model of environment and economy of resources used are required. The relational reinforcement learning technique was investigated within the context of learning a robotic head. A change in the relational reinforcement learning algorithm was proposed, called TGE, and incorporated into an architecture of control of a robotic head. The architecture was evaluated in the context of a real problem not trivial: the learning of shared attention. The results show that the architecture is capable of displaying appropriate behavior during a social interaction controlled through the use of TGE. A comparative analysis was performed with other methods show that the proposed algorithm has achieved a superior performance in most experiments
|
Page generated in 0.0775 seconds