171 |
Discretization and Approximation Methods for Reinforcement Learning of Highly Reconfigurable SystemsLampton, Amanda K. 2009 December 1900 (has links)
There are a number of techniques that are used to solve reinforcement learning
problems, but very few that have been developed for and tested on highly reconfigurable
systems cast as reinforcement learning problems. Reconfigurable systems
refers to a vehicle (air, ground, or water) or collection of vehicles that can change its
geometrical features, i.e. shape or formation, to perform tasks that the vehicle could
not otherwise accomplish. These systems tend to be optimized for several operating
conditions, and then controllers are designed to reconfigure the system from one operating
condition to another. Q-learning, an unsupervised episodic learning technique
that solves the reinforcement learning problem, is an attractive control methodology
for reconfigurable systems. It has been successfully applied to a myriad of control
problems, and there are a number of variations that were developed to avoid or alleviate
some limitations in earlier version of this approach. This dissertation describes the
development of three modular enhancements to the Q-learning algorithm that solve
some of the unique problems that arise when working with this class of systems, such
as the complex interaction of reconfigurable parameters and computationally intensive
models of the systems. A multi-resolution state-space discretization method is developed
that adaptively rediscretizes the state-space by progressively finer grids around
one or more distinct Regions Of Interest within the state or learning space. A genetic
algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning
process. Policy comparison is added to monitor the state of the policy encoded in the
action-value function to prevent unnecessary episodes at each level of discretization.
This approach is validated on several problems including an inverted pendulum, reconfigurable
airfoil, and reconfigurable wing. Results show that the multi-resolution
state-space discretization method reduces the number of state-action pairs, often by
an order of magnitude, required to achieve a specific goal and the policy comparison
prevents unnecessary episodes once the policy has converged to a usable policy. Results
also show that the genetic algorithm is a promising candidate for the selection
of basis functions for function approximation of the action-value function.
|
172 |
A unifying framework for computational reinforcement learning theoryLi, Lihong, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 238-261).
|
173 |
Autonomous qualitative learning of distinctions and actions in a developing agentMugan, Jonathan William 23 November 2010 (has links)
How can an agent bootstrap up from a pixel-level representation to autonomously learn high-level states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments.
This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment.
The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks. / text
|
174 |
Model-based active learning in hierarchical policiesCora, Vlad M. 05 1900 (has links)
Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics.
|
175 |
A service-oriented approach to topology formation and resource discovery in wireless ad-hoc networksGonzalez Valenzuela, Sergio 05 1900 (has links)
The past few years have witnessed a significant evolution in mobile computing and communications, in which new trends and applications have the traditional role of computer networks into that of distributed service providers. In this thesis we explore an alternative way to form wireless ad-hoc networks whose topologies can be customized as required by the users’ software applications. In particular, we investigate the applicability of mobile codes to networks created by devices equipped with Bluetooth technology. Computer simulations results suggest that our proposed approach can achieve this task effectively, while matching the level of efficiency seen in other salient proposals in this area.
This thesis also addresses the issue of service discovery in mobile ad-hoc networks. We propose the use of a directory whose network location varies in an attempt to reduce traffic overhead driven by users’ hosts looking for service information. We refer to this scheme as the Service Directory Placement Algorithm, or SDPA. We formulate the directory relocation problem as a Markov Decision Process that is solved by using Q-learning. Performance evaluations through computer simulations reveal bandwidth overhead reductions that range between 40% and 48% when compared with a basic broadcast flooding approach for networks comprising hosts moving at pedestrian speeds. We then extend our proposed approach and introduce a multi-directory service discovery system called the Service Directory Placement Protocol, or SDPP. Our findings reveal bandwidth overhead reductions typically ranging from 15% to 75% in networks comprising slow-moving hosts with restricted memory availability.
In the fourth and final part of this work, we present the design foundations and architecture of a middleware system that called WISEMAN – WIreless Sensors Employing Mobile Agents. We employ WISEMAN for dispatching and processing mobile programs in Wireless Sensor Networks (WSNs). Our proposed system enables the dynamic creation of semantic relationships between network nodes that cooperate to provide an aggregate service. We present discussions on the advantages of our proposed approach, and in particular, how WISEMAN facilitates the realization of service-oriented tasks in WSNs.
|
176 |
RELPH: A Computational Model for Human Decision MakingMohammadi Sepahvand, Nazanin January 2013 (has links)
The updating process, which consists of building mental models and adapting them to the changes occurring in the environment, is impaired in neglect patients. A simple rock-paper-scissors experiment was conducted in our lab to examine updating impairments in neglect patients. The results of this experiment demonstrate a significant difference between the performance of healthy and brain damaged participants. While healthy controls did not show any difficulty learning the computer’s strategy, right brain damaged patients failed to learn the computer’s strategy. A computational modeling approach is employed to help us better understand the reason behind this difference and thus learn more about the updating process in healthy people and its impairment in right brain damaged patients. Broadly, we hope to learn more about the nature of the updating process, in general. Also the hope is that knowing what must be changed in the model to “brain-damage” it can shed light on the updating deficit in right brain damaged patients. To do so I adapted a pattern detection method named “ELPH” to a reinforcement-learning human decision making model called “RELPH”. This model is capable of capturing the behavior of both healthy and right brain damaged participants in our task according to our defined measures. Indeed, this thesis is an effort to discuss the possible differences among these groups employing this computational model.
|
177 |
Reinforcement Learning and Simulation-Based Search in Computer GoSilver, David Unknown Date
No description available.
|
178 |
Dynamic Tuning of PI-Controllers based on Model-free Reinforcement Learning MethodsAbbasi Brujeni, Lena Unknown Date
No description available.
|
179 |
Online Learning for Linearly Parametrized Control ProblemsAbbasi-Yadkori, Yasin Unknown Date
No description available.
|
180 |
Complying with norms : a neurocomputational explorationColombo, Matteo January 2012 (has links)
The subject matter of this thesis can be summarized by a triplet of questions and answers. Showing what these questions and answers mean is, in essence, the goal of my project. The triplet goes like this: Q: How can we make progress in our understanding of social norms and norm compliance? A: Adopting a neurocomputational framework is one effective way to make progress in our understanding of social norms and norm compliance. Q: What could the neurocomputational mechanism of social norm compliance be? A: The mechanism of norm compliance probably consists of Bayesian - Reinforcement Learning algorithms implemented by activity in certain neural populations. Q: What could information about this mechanism tell us about social norms and social norm compliance? A: Information about this mechanism tells us that: a1: Social norms are uncertainty-minimizing devices. a2: Social norm compliance is one trick that agents employ to interact coadaptively and smoothly in their social environment. Most of the existing treatments of norms and norm compliance (e.g. Bicchieri 2006; Binmore 1993; Elster 1989; Gintis 2010; Lewis 1969; Pettit 1990; Sugden 1986; Ullmann‐Margalit 1977) consist in what Cristina Bicchieri (2006) refers to as “rational reconstructions.” A rational reconstruction of the concept of social norm “specifies in which sense one may say that norms are rational, or compliance with a norm is rational” (Ibid., pp. 10-11). What sets my project apart from these types of treatments is that it aims, first and foremost, at providing a description of some core aspects of the mechanism of norm compliance. The single most original idea put forth in my project is to bring an alternative explanatory framework to bear on social norm compliance. This is the framework of computational cognitive neuroscience. The chapters of this thesis describe some ways in which central issues concerning social norms can be fruitfully addressed within a neurocomputational framework. In order to qualify and articulate the triplet above, my strategy consists firstly in laying down the beginnings of a model of the mechanism of norm compliance behaviour, and then zooming in on specific aspects of the model. Such a model, the chapters of this thesis argue, explains apparently important features of the psychology and neuroscience of norm compliance, and helps us to understand the nature of the social norms we live by.
|
Page generated in 0.0192 seconds