Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
361 |
The Design and Evaluation of Intelligent Sales-agent for Online Persuasion and NegotiationHuang, Shiu-li 23 July 2005 (has links)
Purchasing products from online e-stores is getting popular with the advance of Internet infrastructure and network security. At current stage, most e-stores resemble vending machines rather than real stores because they lack clerks to persuade prospects into buying products and to bargain with the customers for making a good deal. This research aims to design an easy-to-use and autonomous sales-agent, called Isa, to act as a virtual clerk in an e-store. A new approach is proposed to enable the agent to dynamically adopt different persuasion and negotiation strategies according to different characteristics of human buyers. Additionally, this approach enables a sales-agent to learn the best strategies without seller¡¦s instructions. Both laboratory and field experiments are conducted to assess Isa¡¦s performance. The experimental results reveal that Isa can improve a seller¡¦s surplus and increase a buyer¡¦s product evaluation, willingness to pay more money for the product, and satisfaction with visiting the s-store.
|
362 |
Anti-Spam Study: an Alliance-based ApproachChiu, Yu-fen 12 September 2006 (has links)
The growing problem of spam has generated a need for reliable anti-spam filters. There are many filtering techniques along with machine learning and data miming used to reduce the amount of spam. Such algorithms can achieve very high accuracy but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. Much work has been done to improve specific algorithms for the task of detecting spam, but less work has been report on leveraging multiple algorithms in email analysis. This study presents an alliance-based approach to classify, discovery and exchange interesting information on spam. Furthermore, the spam filter in this study is build base on the mixture of rough set theory (RST), genetic algorithm (GA) and XCS classifier system.
RST has the ability to process imprecise and incomplete data such as spam. GA can speed up the rate of finding the optimal solution (i.e. the rules used to block spam). The reinforcement learning of XCS is a good mechanism to suggest the appropriate classification for the email. The results of spam filtering by alliance-based approach are evaluated by several statistical methods and the performance is great. Two main conclusions can be drawn from this study: (1) the rules exchanged from other mail servers indeed help the filter blocking more spam than before. (2) a combination of algorithms improves both accuracy and reducing false positives for the problem of spam detection.
|
363 |
Abstraction In Reinforcement LearningGirgin, Sertan 01 March 2007 (has links) (PDF)
Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Generally, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which degrades the learning performance.
In this thesis, we propose two approaches to build connections between different regions of the search space leading to better utilization of gained experience and accelerate learning is proposed. In the first approach, we first extend existing work of McGovern and propose the formalization of stochastic conditionally terminating sequences with higher representational power. Then, we describe how to efficiently discover and employ useful abstractions during learning based on such sequences. The method constructs a tree structure to keep track of frequently used action sequences together with visited states. This tree is then used to select actions to be executed at each step.
In the second approach, we propose a novel method to identify states with similar sub-policies, and show how they can be integrated into reinforcement learning framework to improve the learning performance. The method uses an efficient data structure to find common action sequences started from observed states and defines a similarity function between states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected to all similar states. This, consequently, allows experience acquired during learning be applied to a broader context.
Effectiveness of both approaches is demonstrated empirically by conducting extensive experiments on various domains.
|
364 |
Scaling solutions to Markov Decision ProblemsZang, Peng 14 November 2011 (has links)
The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider.
In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
|
365 |
Application of reinforcement learning to multi-agent production schedulingWang, Yi-Chi. January 2003 (has links)
Thesis (Ph. D.)--Mississippi State University. Department of Industrial Engineering. / Title from title screen. Includes bibliographical references.
|
366 |
Texplore : temporal difference reinforcement learning for robots and time-constrained domains / Temporal difference reinforcement learning for robots and time-constrained domainsHester, Todd 30 January 2013 (has links)
Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This dissertation identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuous state features; 3) it must handle sensor and/or actuator delays; and 4) it should continually select actions in real time. This dissertation focuses on addressing all four of these challenges. In particular, this dissertation is focused on time-constrained domains where the first challenge is critically important. In these domains, the agent's lifetime is not long enough for it to explore the domain thoroughly, and it must learn in very few samples. Although existing RL algorithms successfully address one or more of the RL for Robotics Challenges, no prior algorithm addresses all four of them. To fill this gap, this dissertation introduces TEXPLORE, the first algorithm to address all four challenges. TEXPLORE is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. Each tree in the random forest model represents a hypothesis of the domain's true dynamics, and the agent uses these hypotheses to explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real time whenever necessary. We empirically evaluate each component of TEXPLORE in comparison with other state-of-the-art approaches. In addition, we present modifications of TEXPLORE's exploration mechanism for different types of domains. The key result of this dissertation is a demonstration of TEXPLORE learning to control the velocity of an autonomous vehicle on-line, in real time, while running on-board the robot. After controlling the vehicle for only two minutes, TEXPLORE is able to learn to move the pedals of the vehicle to drive at the desired velocities. The work presented in this dissertation represents an important step towards applying RL to robotics and enabling robots to perform more tasks in society. By enabling robots to learn in few actions while acting on-line in real time on robots with continuous state and actuator delays, TEXPLORE significantly broadens the applicability of RL to robots. / text
|
367 |
Learning from human-generated rewardKnox, William Bradley 15 February 2013 (has links)
Robots and other computational agents are increasingly becoming part of our daily lives. They will need to be able to learn to perform new tasks, adapt to novel situations, and understand what is wanted by their human users, most of whom will not have programming skills. To achieve these ends, agents must learn from humans using methods of communication that are naturally accessible to everyone. This thesis presents and formalizes interactive shaping, one such teaching method, where agents learn from real-valued reward signals that are generated by a human trainer. In interactive shaping, a human trainer observes an agent behaving in a task environment and delivers feedback signals. These signals are mapped to numeric values, which are used by the agent to specify correct behavior. A solution to the problem of interactive shaping maps human reward to some objective such that maximizing that objective generally leads to the behavior that the trainer desires.
Interactive shaping addresses the aforementioned needs of real-world agents. This teaching method allows human users to quickly teach agents the specific behaviors that they desire. Further, humans can shape agents without needing programming skills or even detailed knowledge of how to perform the task themselves. In contrast, algorithms that learn autonomously from only a pre-programmed evaluative signal often learn slowly, which is unacceptable for some real-world tasks with real-world costs. These autonomous algorithms additionally have an inflexibly defined set of optimal behaviors, changeable only through additional programming. Through interactive shaping, human users can (1) specify and teach desired behavior and (2) share task knowledge when correct behavior is already indirectly specified by an objective function. Additionally, computational agents that can be taught interactively by humans provide a unique opportunity to study how humans teach in a highly controlled setting, in which the computer agent’s behavior is parametrized.
This thesis answers the following question. How and to what extent can agents harness the information contained in human-generated signals of reward to learn sequential decision-making tasks? The contributions of this thesis begin with an operational definition of the problem of interactive shaping. Next, I introduce the tamer framework, one solution to the problem of interactive shaping, and describe and analyze algorithmic implementations of the framework within multiple domains. This thesis also proposes and empirically examines algorithms for learning from both human reward and a pre-programmed reward function within an MDP, demonstrating two techniques that consistently outperform learning from either feedback signal alone. Subsequently, the thesis shifts its focus from the agent to the trainer, describing two psychological studies in which the trainer is manipulated by either changing their perceived role or by having the agent intentionally misbehave at specific times; we examine the effect of these manipulations on trainer behavior and the agent’s learned task performance. Lastly, I return to the problem of interactive shaping, for which we examine a space of mappings from human reward to objective functions, where mappings differ by how much the agent discounts reward it expects to receive in the future. Through this investigation, a deep relationship is identified between discounting, the level of positivity in human reward, and training success. Specific constraints of human reward are identified (i.e., the “positive circuits” problem), as are strategies for overcoming these constraints, pointing towards interactive shaping methods that are more effective than the already successful tamer framework. / text
|
368 |
Semi-Cooperative Learning in Smart Grid AgentsReddy, Prashant P. 01 December 2013 (has links)
Striving to reduce the environmental impact of our growing energy demand creates tough new challenges in how we generate and use electricity. We need to develop Smart Grid systems in which distributed sustainable energy resources are fully integrated and energy consumption is efficient. Customers, i.e., consumers and distributed producers, require agent technology that automates much of their decision-making to become active participants in the Smart Grid. This thesis develops models and learning algorithms for such autonomous agents in an environment where customers operate in modern retail power markets and thus have a choice of intermediary brokers with whom they can contract to buy or sell power. In this setting, customers face a learning and multiscale decision-making problem – they must manage contracts with one or more brokers and simultaneously, on a finer timescale, manage their consumption or production levels under existing contracts. On a contextual scale, they can optimize their isolated selfinterest or consider their shared goals with other agents. We advance the idea that a Learning Utility Management Agent (LUMA), or a network of such agents, deployed on behalf of a Smart Grid customer can autonomously address that customer’s multiscale decision-making responsibilities. We study several relationships between a given LUMA and other agents in the environment. These relationships are semi-cooperative and the degree of expected cooperation can change dynamically with the evolving state of the world. We exploit the multiagent structure of the problem to control the degree of partial observability. Since a large portion of relevant hidden information is visible to the other agents in the environment, we develop methods for Negotiated Learning, whereby a LUMA can offer incentives to the other agents to obtain information that sufficiently reduces its own uncertainty while trading off the cost of offering those incentives. The thesis first introduces pricing algorithms for autonomous broker agents, time series forecasting models for long range simulation, and capacity optimization algorithms for multi-dwelling customers. We then introduce Negotiable Entity Selection Processes (NESP) as a formal representation where partial observability is negotiable amongst certain classes of agents. We then develop our ATTRACTIONBOUNDED- LEARNING algorithm, which leverages the variability of hidden information for efficient multiagent learning. We apply the algorithm to address the variable-rate tariff selection and capacity aggregate management problems faced by Smart Grid customers. We evaluate the work on real data using Power TAC, an agent-based Smart Grid simulation platform and substantiate the value of autonomous Learning Utility Management Agents in the Smart Grid.
|
369 |
Automated domain analysis and transfer learning in general game playingKuhlmann, Gregory John 13 December 2010 (has links)
Creating programs that can play games such as chess, checkers, and
backgammon, at a high level has long been a challenge and benchmark
for AI. Computer game playing is arguably one of AI's biggest success stories. Several game playing systems developed in the past, such as Deep Blue, Chinook and TD-Gammon have demonstrated competitive play against the top human players. However, such systems are limited in that they play only one particular game and they typically must be supplied with game-specific knowledge. While their performance is impressive, it is difficult to determine if their success is due to generally applicable techniques or due to the human game analysis.
A general game player is an agent capable of taking as input a
description of a game's rules and proceeding to play without any
subsequent human input. In doing so, the agent, rather than the human designer, is responsible for the domain analysis. Developing such a system requires the integration of several AI components, including theorem proving, feature discovery, heuristic search, and machine
learning.
In the general game playing scenario, the player agent is supplied with a game's rules in a formal language, prior to match play. This thesis contributes a collection of general methods for analyzing these game descriptions to improve performance. Prior work on automated domain analysis has focused on generating heuristic evaluation functions for use in search. The thesis builds upon this work by introducing a novel feature generation method. Also, I introduce a method for generating and comparing simple evaluation functions based
on these features. I describe how more sophisticated evaluation
functions can be generated through learning. Finally, this thesis
demonstrates the utility of domain analysis in facilitating knowledge
transfer between games for improved learning speed. The contributions are fully implemented with empirical results in the general game playing system. / text
|
370 |
An Integrated Simulation, Learning and Game-theoretic Framework for Supply Chain CompetitionXu, Dong January 2014 (has links)
An integrated simulation, learning, and game-theoretic framework is proposed to address the dynamics of supply chain competition. The proposed framework is composed of 1) simulation-based game platform, 2) game solving and analysis module, and 3) multi-agent reinforcement learning module. The simulation-based game platform supports multi-paradigm modeling, such as agent-based modeling, discrete-event simulation, and system dynamics modeling. The game solving and analysis module is designed to include various parts including strategy refinement, data sampling, game solving, equilibrium conditions, solution evaluation, as well as comparative statistics under varying parameter values. The learning module facilitates the decision making of each supply chain competitor under the stochastic and uncertain environments considering different learning strategies. The proposed integrated framework is illustrated for a supply chain system under the newsvendor problem setting in several phases. At phase 1, an extended newsvendor competition considering both the product sale price and service level under an uncertain demand is studied. Assuming that each retailer has the full knowledge of the other retailer's decision space and profit function, we derived the existence and uniqueness conditions of a pure strategy Nash equilibrium with respect to the price and service dominance under additive and multiplicative demand forms. Furthermore, we compared the bounds and obtained various managerial insights. At phase 2, to extend the number of decision variables and enrich the payoff function of the problem considered at phase 1, a hybrid simulation-based framework involving systems dynamics and agent-based modeling is presented, followed by a novel game solving procedure, where the procedural components include strategy refinement, data sampling, gaming solving, and performance evaluation. Various numerical analyses based on the proposed procedure are presented, such as equilibrium accuracy, quality, and asymptotic/marginal stability. At phase 3, multi-agent reinforcement learning technique is employed for the competition scenarios under a partial/incomplete information setting, where each retailer can only observe the opponent' behaviors and adapt to them. Under such a setting, we studied different learning policies and learning rates with different decay patterns between the two competitors. Furthermore, the convergence issues are discussed as well. Finally, the best learning strategies under different problem scenarios are devised.
|
Page generated in 0.1172 seconds