Global ETD Search

361	Oppositional Reinforcement Learning with Applications Shokri, Maryam 05 September 2008 (has links) Machine intelligence techniques contribute to solving real-world problems. Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent. In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations. The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal. One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence. At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique. The design of reinforcement learning agent depends on the application. The emphasize of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different applications. The design challenges and some solutions to overcome the problems and improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values. This accelerates the learning process in general and the exploration phase in particular. Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique. Reinforcement learning opposition-based learning OQ(lambda) NOQ(lambda) System Design Engineering
362	Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models Mahootchi, Masoud January 2009 (has links) In this thesis, modeling and optimization in the field of storage management under stochastic condition will be investigated using two different methodologies: Simulation Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement Learning (RL), and Nonlinear Modeling Techniques (NMT). For the first set of methods, simulation plays a fundamental role in evaluating the control policy: learning techniques are used to deliver sub-optimal policies at the end of a learning process. These iterative methods use the interaction of agents with the stochastic environment through taking actions and observing different states. To converge to the steady-state condition where policies and value functions do not change significantly with the continuation of the learning process, all or most important states must be visited sufficiently. This might be prohibitively time-consuming for large-scale problems. To make these techniques more efficient both in terms of computation time and robust optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa, and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated that, function approximation techniques such as neural networks can contribute to the process of learning. The state-of-the-art implementations usually consider the maximization of expected value of accumulated reward. Extending these techniques to consider risk and solving some well-known control problems are important contributions of this thesis. Furthermore, the new nonlinear modeling for reservoir management using indicator functions and randomized policy introduced by Fletcher and Ponnambalam, is extended to stochastic releases in multi-reservoir systems. In this extension, two different approaches for defining the release policies are proposed. In addition, the main restriction of considering the normal distribution for inflow is relaxed by using a beta-equivalent general distribution. A five-reservoir case study from India is used to demonstrate the benefits of these new developments. Using a warehouse management problem as an example, application of the proposed method to other storage management problems is outlined. reinforcement learning nonlinear models storage management reservoir management System Design Engineering
363	Feature Selection for Value Function Approximation Taylor, Gavin January 2011 (has links) <p>The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.</p><p>Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.</p> / Dissertation Artificial Intelligence Computer Science Feature Selection Reinforcement Learning Value Function Approximation
364	A Study on Architecture, Algorithms, and Applications of Approximate Dynamic Programming Based Approach to Optimal Control Lee, Jong Min 12 July 2004 (has links) This thesis develops approximate dynamic programming (ADP) strategies suitable for process control problems aimed at overcoming the limitations of MPC, which are the potentially exorbitant on-line computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. The suggested approach solves the DP only for the state points visited by closed-loop simulations with judiciously chosen control policies. The approach helps us combat a well-known problem of the traditional DP called 'curse-of-dimensionality,' while it allows the user to derive an improved control policy from the initial ones. The critical issue of the suggested method is a proper choice and design of function approximator. A local averager with a penalty term is proposed to guarantee a stably learned control policy as well as acceptable on-line performance. The thesis also demonstrates versatility of the proposed ADP strategy with difficult process control problems. First, a stochastic adaptive control problem is presented. In this application an ADP-based control policy shows an "active" probing property to reduce uncertainties, leading to a better control performance. The second example is a dual-mode controller, which is a supervisory scheme that actively prevents the progression of abnormal situations under a local controller at their onset. Finally, two ADP strategies for controlling nonlinear processes based on input-output data are suggested. They are model-based and model-free approaches, and have the advantage of conveniently incorporating the knowledge of identification data distribution into the control calculation with performance improvement. Penalty function Reinforcement learning Neuro-dynamic programming Model predictive control Function approximation
365	Reinforcement Learning for Active Length Control and Hysteresis Characterization of Shape Memory Alloys Kirkpatrick, Kenton C. 16 January 2010 (has links) Shape Memory Alloy actuators can be used for morphing, or shape change, by controlling their temperature, which is effectively done by applying a voltage difference across their length. Control of these actuators requires determination of the relationship between voltage and strain so that an input-output map can be developed. In this research, a computer simulation uses a hyperbolic tangent curve to simulate the hysteresis behavior of a virtual Shape Memory Alloy wire in temperature-strain space, and uses a Reinforcement Learning algorithm called Sarsa to learn a near-optimal control policy and map the hysteretic region. The algorithm developed in simulation is then applied to an experimental apparatus where a Shape Memory Alloy wire is characterized in temperature-strain space. This algorithm is then modified so that the learning is done in voltage-strain space. This allows for the learning of a control policy that can provide a direct input-output mapping of voltage to position for a real wire. This research was successful in achieving its objectives. In the simulation phase, the Reinforcement Learning algorithm proved to be capable of controlling a virtual Shape Memory Alloy wire by determining an accurate input-output map of temperature to strain. The virtual model used was also shown to be accurate for characterizing Shape Memory Alloy hysteresis by validating it through comparison to the commonly used modified Preisach model. The validated algorithm was successfully applied to an experimental apparatus, in which both major and minor hysteresis loops were learned in temperature-strain space. Finally, the modified algorithm was able to learn the control policy in voltage-strain space with the capability of achieving all learned goal states within a tolerance of +-0.5% strain, or +-0.65mm. This policy provides the capability of achieving any learned goal when starting from any initial strain state. This research has validated that Reinforcement Learning is capable of determining a control policy for Shape Memory Alloy crystal phase transformations, and will open the door for research into the development of length controllable Shape Memory Alloy actuators. Reinforcement Learning Shape Memory Alloys morphing aircraft machine learning Sarsa Preisach Model Markov Property
366	The Design and Evaluation of Intelligent Sales-agent for Online Persuasion and Negotiation Huang, Shiu-li 23 July 2005 (has links) Purchasing products from online e-stores is getting popular with the advance of Internet infrastructure and network security. At current stage, most e-stores resemble vending machines rather than real stores because they lack clerks to persuade prospects into buying products and to bargain with the customers for making a good deal. This research aims to design an easy-to-use and autonomous sales-agent, called Isa, to act as a virtual clerk in an e-store. A new approach is proposed to enable the agent to dynamically adopt different persuasion and negotiation strategies according to different characteristics of human buyers. Additionally, this approach enables a sales-agent to learn the best strategies without seller¡¦s instructions. Both laboratory and field experiments are conducted to assess Isa¡¦s performance. The experimental results reveal that Isa can improve a seller¡¦s surplus and increase a buyer¡¦s product evaluation, willingness to pay more money for the product, and satisfaction with visiting the s-store. sales-agent abstract argumentation framework defeasible reasoning Elaboration Likelihood Model reinforcement learning negotiation persuasion
367	Anti-Spam Study: an Alliance-based Approach Chiu, Yu-fen 12 September 2006 (has links) The growing problem of spam has generated a need for reliable anti-spam filters. There are many filtering techniques along with machine learning and data miming used to reduce the amount of spam. Such algorithms can achieve very high accuracy but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. Much work has been done to improve specific algorithms for the task of detecting spam, but less work has been report on leveraging multiple algorithms in email analysis. This study presents an alliance-based approach to classify, discovery and exchange interesting information on spam. Furthermore, the spam filter in this study is build base on the mixture of rough set theory (RST), genetic algorithm (GA) and XCS classifier system. RST has the ability to process imprecise and incomplete data such as spam. GA can speed up the rate of finding the optimal solution (i.e. the rules used to block spam). The reinforcement learning of XCS is a good mechanism to suggest the appropriate classification for the email. The results of spam filtering by alliance-based approach are evaluated by several statistical methods and the performance is great. Two main conclusions can be drawn from this study: (1) the rules exchanged from other mail servers indeed help the filter blocking more spam than before. (2) a combination of algorithms improves both accuracy and reducing false positives for the problem of spam detection. Rough set theory Reinforcement learning XCS classifier system Spam Text classification
368	Abstraction In Reinforcement Learning Girgin, Sertan 01 March 2007 (has links) (PDF) Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Generally, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which degrades the learning performance. In this thesis, we propose two approaches to build connections between different regions of the search space leading to better utilization of gained experience and accelerate learning is proposed. In the first approach, we first extend existing work of McGovern and propose the formalization of stochastic conditionally terminating sequences with higher representational power. Then, we describe how to efficiently discover and employ useful abstractions during learning based on such sequences. The method constructs a tree structure to keep track of frequently used action sequences together with visited states. This tree is then used to select actions to be executed at each step. In the second approach, we propose a novel method to identify states with similar sub-policies, and show how they can be integrated into reinforcement learning framework to improve the learning performance. The method uses an efficient data structure to find common action sequences started from observed states and defines a similarity function between states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected to all similar states. This, consequently, allows experience acquired during learning be applied to a broader context. Effectiveness of both approaches is demonstrated empirically by conducting extensive experiments on various domains.
369	Scaling solutions to Markov Decision Problems Zang, Peng 14 November 2011 (has links) The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider. In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors. Reinforcement learning Machine learning Planning Artificial intelligence Markov decision processes Markov processes Mathematical models Dynamic programming
370	Application of reinforcement learning to multi-agent production scheduling Wang, Yi-Chi. January 2003 (has links) Thesis (Ph. D.)--Mississippi State University. Department of Industrial Engineering. / Title from title screen. Includes bibliographical references.

Search results