• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 711
  • 81
  • 70
  • 22
  • 11
  • 9
  • 8
  • 7
  • 7
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1148
  • 1148
  • 282
  • 241
  • 222
  • 196
  • 176
  • 167
  • 167
  • 164
  • 155
  • 136
  • 130
  • 128
  • 120
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
351

Heterogeneous representations for reinforcement learning control of dynamic systems

McGarity, Michael, Computer Science & Engineering, Faculty of Engineering, UNSW January 2004 (has links)
Intelligent agents are designed to interact with, and learn about, their environment so that they can act purposefully towards a goal. One class of problems encountered in building such agents is learning how to respond to dynamic systems with a continuous state space. The goals of this dissertation are to develop a framework for understanding the behaviour of partitioned dynamic systems with continuous underlying state and to translate this framework into algorithms which adaptively form a partition of the continuous space such that the partitioned system is more easily learned and controlled, and such that the control law may be easily explained in intuitive ways. Currently, algorithms which learn a control policy for partitioned continuous state space systems treat the partitioned system as an approximation to a Markov chain. I give conditions for the partitioned system to be a Markov chain, a semi-Markov process and a new class of system, a weak-semi-Markov process. The weak-semi-Markov model is shown to model partitioned dynamic systems with greater economy than other surveyed models. The behaviour of a partitioned state space system in the area around the region boundaries is also considered. I use the theory of sliding surfaces, and some heuristic arguments to recommend region boundary shape and position. The concept of 'staying on the boundary' then becomes a robust and relatively easy subgoal within the control algorithm. The concept of 'reaching the sliding surface' as a subgoal is used as the basis for an intuitive explanation of the learnt controller. I present an algorithm based on this concept which explains the behaviour of a learnt controller in ways not previously available to a machine learning algorithms. Finally, the Markov Property and the theory of Sliding Mode Control are used as the basis of a class of recursive algorithms. These algorithms adaptively find a partition, and simultaneously use this partition in conjunction with one of five reinforcement learning algorithms to find a control policy based on that partition. This technique is shown to work very well in learning, controlling and explaining a variety of physical systems, from a monorail to a container crane.
352

Reinforcement learning-based resource allocation in cellular telecommunications systems

Lilith, Nimrod January 2005 (has links)
The work in this thesis concerns the use of reinforcement learning solutions to re-source allocation problems in channelised cellular networks. The methodology of re-inforcement learning techniques was chosen for application to these problems due to its capability of finding efficient policies in a fully on-line, adaptable manner, without requiring specific environment models. All of the presented agent architectures are assumed to simultaneously learn and perform network control functions in a totally on-line and unsupervised manner, and agents are developed with a view to real-world implementability by focussing on techniques that have low resource requirements and make use of only local system information.
353

Modeling Stock Order Flows and Learning Market-Making from Data

Kim, Adlar J., Shelton, Christian R. 01 June 2002 (has links)
Stock markets employ specialized traders, market-makers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable market-making strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an Input-Output Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly non-linear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partially-observable environment. We demonstrate learning results for two separate real stocks.
354

Game-independent AI agents for playing Atari 2600 console games

Naddaf, Yavar 06 1900 (has links)
This research focuses on developing AI agents that play arbitrary Atari 2600 console games without having any game-specific assumptions or prior knowledge. Two main approaches are considered: reinforcement learning based methods and search based methods. The RL-based methods use feature vectors generated from the game screen as well as the console RAM to learn to play a given game. The search-based methods use the emulator to simulate the consequence of actions into the future, aiming to play as well as possible by only exploring a very small fraction of the state-space. To insure the generic nature of our methods, all agents are designed and tuned using four specific games. Once the development and parameter selection is complete, the performance of the agents is evaluated on a set of 50 randomly selected games. Significant learning is reported for the RL-based methods on most games. Additionally, some instances of human-level performance is achieved by the search-based methods.
355

Revenue Management for Make-to-Order and Make-to-Stock Systems

Wang, Jiao 01 May 2011 (has links)
With the success of Revenue Management (RM) techniques over the past three decades in various segments of the service industry, many manufacturing firms have started exploring innovative RM technologies to improve their profits. This dissertation studies RM for make-to-order (MTO) and make-to-stock (MTS) systems. We start with a problem faced by a MTO firm that has the ability to reject or accept the order and set prices and lead-times to influence demands. The firm is confronted with the problem to decide, which orders to accept or reject and trade-off the price, lead-time and potential for increased demand against capacity constraints, in order to maximize the total profits in a finite planning horizon with deterministic demands. We develop a mathematical model for this problem. Through numerical analysis, we present insights regarding the benefits of price customization and lead-time flexibilities in various demand scenarios. However, the demands of MTO firms are always hard to be predicted in most situations. We further study the above problem under the stochastic demands, with the objective to maximize the long-run average profit. We model the problem as a Semi-Markov Decision Problem (SMDP) and develop a reinforcement learning (RL) algorithm-Q-learning algorithm (QLA), in which a decision agent is assigned to the machine and improves the accuracy of its action-selection decisions via a “learning" process. Numerical experiment shows the superior performance of the QLA. Finally, we consider a problem in a MTS production system consists of a single machine in which the demands and the processing times for N types of products are random. The problem is to decide when, what, and how much to produce so that the long-run average profit. We develop a mathematical model and propose two RL algorithms for real-time decision-making. Specifically, one is a Q-learning algorithm for Semi-Markov decision process (QLS) and another is a Q-learning algorithm with a learning-improvement heuristic (QLIH) to further improve the performance of QLS. We compare the performance of QLS and QLIH with a benchmarking Brownian policy and the first-come-first-serve policy. The numerical results show that QLIH outperforms QLS and both benchmarking policies.
356

Dynamic Cooperative Secondary Access inHierarchical Spectrum Sharing Networks

Wang, Liping, Fodor, Viktoria Unknown Date (has links)
We consider a hierarchical spectrum sharing network consisting of a primary and a cognitive secondary transmitter-receiver pair, with non-backlogged traffic. The secondary transmitter may utilize cooperative transmission techniques to relay primary traffic while superimposing its own information, or transmit opportunistically when the primary user is idle. The secondary user meets a dilemma in this scenario. Choosing cooperation it can transmit a packet immediately even if the primary queue is not empty, but it has to bear the additional cost of relaying, since the primary performance needs to be guaranteed. To solve this dilemma we propose dynamic cooperative secondary access control that takes the state of the spectrum sharing network into account. We formulate the problem as a Markov Decision Process (MDP) and prove the existence of a stationary policy that is average cost optimal. Then we consider the scenario when the traffic and link statistics are not known at the secondary user, and propose to find the optimal transmission strategy using reinforcement learning. With extensive numerical evaluation, we demonstrate that dynamic cooperation with state aware sequential decision is very efficient in spectrum sharing systems with stochastic traffic, and show that dynamic cooperation is necessary for the secondary system to be able to adapt to changing load conditions or to changing available energy resource. Our results show, that learning based access control, with or without known primary buffer state, has close to optimal performance. / <p>QS 2013</p>
357

Learning with ALiCE II

Lockery, Daniel Alexander 14 September 2007 (has links)
The problem considered in this thesis is the development of an autonomous prototype robot capable of gathering sensory information from its environment allowing it to provide feedback on the condition of specific targets to aid in maintenance of hydro equipment. The context for the solution to this problem is based on the power grid environment operated by the local hydro utility. The intent is to monitor power line structures by travelling along skywire located at the top of towers, providing a view of everything beneath it including, for example, insulators, conductors, and towers. The contribution of this thesis is a novel robot design with the potential to prevent hazardous situations and the use of rough coverage feedback modified reinforcement learning algorithms to establish behaviours. / October 2007
358

Reinforcement learning in biologically-inspired collective robotics: a rough set approach

Henry, Christopher 19 September 2006 (has links)
This thesis presents a rough set approach to reinforcement learning. This is made possible by considering behaviour patterns of learning agents in the context of approximation spaces. Rough set theory introduced by Zdzisław Pawlak in the early 1980s provides a ground for deriving pattern-based rewards within approximation spaces. Learning can be considered episodic. The framework provided by an approximation space makes it possible to derive pattern-based reference rewards at the end of each episode. Reference rewards provide a standard for reinforcement comparison as well as the actor-critic method of reinforcement learning. In addition, approximation spaces provide a basis for deriving episodic weights that provide a basis for a new form of off-policy Monte Carlo learning control method. A number of conventional and pattern-based reinforcement learning methods are investigated in this thesis. In addition, this thesis introduces two learning environments used to compare the algorithms. The first is a Monocular Vision System used to track a moving target. The second is an artificial ecosystem testbed that makes it possible to study swarm behaviour by collections of biologically-inspired bots. The simulated ecosystem has an ethological basis inspired by the work of Niko Tinbergen, who introduced in the 1960s methods of observing and explaining the behaviour of biological organisms that carry over into the study of the behaviour of interacting robotic devices that cooperate to survive and to carry out highly specialized tasks. Agent behaviour during each episode is recorded in a decision table called an ethogram, which records features such as states, proximate causes, responses (actions), action preferences, rewards and decisions (actions chosen and actions rejected). At all times an agent follows a policy that maps perceived states of the environment to actions. The goal of the learning algorithms is to find an optimal policy in a non-stationary environment. The results of the learning experiments with seven forms of reinforcement learning are given. The contribution of this thesis is a comprehensive introduction to a pattern-based evaluation of behaviour during reinforcement learning using approximation spaces. / May 2006
359

Oppositional Reinforcement Learning with Applications

Shokri, Maryam 05 September 2008 (has links)
Machine intelligence techniques contribute to solving real-world problems. Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent. In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations. The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal. One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence. At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique. The design of reinforcement learning agent depends on the application. The emphasize of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different applications. The design challenges and some solutions to overcome the problems and improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values. This accelerates the learning process in general and the exploration phase in particular. Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique.
360

Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models

Mahootchi, Masoud January 2009 (has links)
In this thesis, modeling and optimization in the field of storage management under stochastic condition will be investigated using two different methodologies: Simulation Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement Learning (RL), and Nonlinear Modeling Techniques (NMT). For the first set of methods, simulation plays a fundamental role in evaluating the control policy: learning techniques are used to deliver sub-optimal policies at the end of a learning process. These iterative methods use the interaction of agents with the stochastic environment through taking actions and observing different states. To converge to the steady-state condition where policies and value functions do not change significantly with the continuation of the learning process, all or most important states must be visited sufficiently. This might be prohibitively time-consuming for large-scale problems. To make these techniques more efficient both in terms of computation time and robust optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa, and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated that, function approximation techniques such as neural networks can contribute to the process of learning. The state-of-the-art implementations usually consider the maximization of expected value of accumulated reward. Extending these techniques to consider risk and solving some well-known control problems are important contributions of this thesis. Furthermore, the new nonlinear modeling for reservoir management using indicator functions and randomized policy introduced by Fletcher and Ponnambalam, is extended to stochastic releases in multi-reservoir systems. In this extension, two different approaches for defining the release policies are proposed. In addition, the main restriction of considering the normal distribution for inflow is relaxed by using a beta-equivalent general distribution. A five-reservoir case study from India is used to demonstrate the benefits of these new developments. Using a warehouse management problem as an example, application of the proposed method to other storage management problems is outlined.

Page generated in 0.1278 seconds