51 |
Learning user modelling strategies for adaptive referring expression generation in spoken dialogue systemsJanarthanam, Srinivasan Chandrasekaran January 2011 (has links)
We address the problem of dynamic user modelling for referring expression generation in spoken dialogue systems, i.e how a spoken dialogue system should choose referring expressions to refer to domain entities to users with different levels of domain expertise, whose domain knowledge is initially unknown to the system. We approach this problem using a statistical planning framework: Reinforcement Learning techniques in Markov Decision Processes (MDP). We present a new reinforcement learning framework to learn user modelling strategies for adaptive referring expression generation (REG) in resource scarce domains (i.e. where no large corpus exists for learning). As a part of the framework, we present novel user simulation models that are sensitive to the referring expressions used by the system and are able to simulate users with different levels of domain knowledge. Such models are shown to simulate real user behaviour more closely than baseline user simulation models. In contrast to previous approaches to user adaptive systems, we do not assume that the user’s domain knowledge is available to the system before the conversation starts. We show that using a small corpus of non-adaptive dialogues it is possible to learn an adaptive user modelling policy in resource scarce domains using our framework. We also show that the learned user modelling strategies performed better in terms of adaptation than hand-coded baselines policies on both simulated and real users. With real users, the learned policy produced around 20% increase in adaptation in comparison to the best performing hand-coded adaptive baseline. We also show that adaptation to user’s domain knowledge results in improving task success (99.47% for learned policy vs 84.7% for hand-coded baseline) and reducing dialogue time of the conversation (11% relative difference). This is because users found it easier to identify domain objects when the system used adaptive referring expressions during the conversations.
|
52 |
Neural mechanisms of suboptimal decisionsChau, Ka Hung Bolton January 2014 (has links)
Making good decisions and adapting flexibly to environmental change are critical to the survival of animals. In this thesis, I investigated neural mechanisms underlying suboptimal decision making in humans and underlying behavioural adaptation in monkeys with the use of functional magnetic resonance imaging (fMRI) in both species. In recent decades, in the neuroscience of decision making, there has been a prominent focus on binary decisions. Whether the presence of an additional third option could have an impact on behaviour and neural signals has been largely overlooked. I designed an experiment in which decisions were made between two options in the presence of a third option. A biophysical model simulation made surprising predictions that more suboptimal decisions were made in the presence of a very poor third alternative. Subsequent human behavioural testing showed consistent results with these predictions. In the ventromedial prefrontal cortex (vmPFC), I found that a value comparison signal that is critical for decision making became weaker in the presence of a poor value third option. The effect contrasts with another prominent potential mechanism during multi-alternative decision making – divisive normalization – the signatures of which were observed in the posterior parietal cortex. It has long been thought that the orbitofrontal cortex (OFC) and amygdala mediate reward-guided behavioural adaptation. However, this viewpoint has been recently challenged. I recorded whole brain activity in macaques using fMRI while they performed an object discrimination reversal task over multiple testing sessions. I identified a lateral OFC (lOFC) region in which activity predicted adaptive win-stay/lose-shift behaviour. In contrast, anterior cingulate cortex (ACC) activity predicted future exploratory decisions regardless of reward outcome. Amygdala and lOFC activity was more strongly coupled for adaptive choice shifting and decoupled for task irrelevant reward memory. Day-to-day fluctuations in signals and signal coupling were correlated with day-to-day fluctuations in performance. These data demonstrate OFC, ACC, and amygdala each make unique contributions to flexible behaviour and credit assignment.
|
53 |
The behavior of institutional investors in IPO markets and the decision of going public abroadFu, Youyan January 2016 (has links)
This thesis comprehensively studies three questions. First of all, I use a unique set of institutional investor bids to examine the impact of personal experience on the behavior of institutional investors in an IPO market. I find that, when deciding to participate in future IPOs, institutions take into account initial returns of past IPOs in which they submitted bids more than IPOs which they merely observed. In addition, initial returns from past IPOs in which institutions’ bids were qualified for share allocation were given more consideration than IPOs for which unqualified bids were submitted. This phenomenon is consistent with reinforcement learning. I also find that institutions do not distinguish the returns that are derived from random events. Furthermore, institutions become more aggressive bidders after experiencing high returns in recent IPOs, conditional on personal participation or being qualified for share allocation in those IPOs. This bidding behavior provides additional evidence of reinforcement learning in IPO markets. Secondly, I merge the dataset of institutional investor bids with post-IPO institutional holdings data to examine whether institutional investors such as fund companies reveal their true valuations through bids in a unique quasi-bookbuilding IPO mechanism. I find that fund companies do truthfully disclose their private information via bids, despite these being without guaranteed compensation. My results contribute to the existing literature by providing new evidence on the information compensation theory and have implications for the IPO mechanism design. Finally, I explore the impact on firm valuation of going public abroad using a sample of 136 Chinese firms that conducted IPOs in the US during the period of 1999-2012. I find that US-listed Chinese firms have higher price multiples and experience less underpricing than their domestic-listed peers. The valuation premium stays consistent when a firm’s characteristics and listing cost are being controlled. These findings are consistent with the theories of foreign listing. Moreover, I find that high-tech Chinese firms with a high growth rate but low profitability are more likely to issue shares in the US, particularly for specific industries such as semiconductors, software and online business services. This industry clustering is interpreted as an incentive to access foreign expertise through listing abroad.
|
54 |
An architecture for situated learning agentsMitchell, Matthew Winston, 1968- January 2003 (has links)
Abstract not available
|
55 |
Reinforcement learning by incremental patchingKim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details.
|
56 |
Modelling motivation for experience-based attention focus in reinforcement learningMerrick, Kathryn January 2007 (has links)
Doctor of Philosophy / Computational models of motivation are software reasoning processes designed to direct, activate or organise the behaviour of artificial agents. Models of motivation inspired by psychological motivation theories permit the design of agents with a key reasoning characteristic of natural systems: experience-based attention focus. The ability to focus attention is critical for agent behaviour in complex or dynamic environments where only small amounts of available information is relevant at a particular time. Furthermore, experience-based attention focus enables adaptive behaviour that focuses on different tasks at different times in response to an agent’s experiences in its environment. This thesis is concerned with the synthesis of motivation and reinforcement learning in artificial agents. This extends reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Reinforcement learning algorithms are computational approaches to learning characterised by the use of reward or punishment to direct learning. The focus of much existing reinforcement learning research has been on the design of the learning component. In contrast, the focus of this thesis is on the design of computational models of motivation as approaches to the reinforcement component that generates reward or punishment. The primary aim of this thesis is to develop computational models of motivation that extend reinforcement learning with three key aspects of attention focus: rhythmic behavioural cycles, adaptive behaviour and multi-task learning in complex, dynamic environments. This is achieved by representing such environments using context-free grammars, modelling maintenance tasks as observations of these environments and modelling achievement tasks as events in these environments. Motivation is modelled by processes for task selection, the computation of experience-based reward signals for different tasks and arbitration between reward signals to produce a motivation signal. Two specific models of motivation based on the experience-oriented psychological concepts of interest and competence are designed within this framework. The first models motivation as a function of environmental experiences while the second models motivation as an introspective process. This thesis synthesises motivation and reinforcement learning as motivated reinforcement learning agents. Three models of motivated reinforcement learning are presented to explore the combination of motivation with three existing reinforcement learning components. The first model combines motivation with flat reinforcement learning for highly adaptive learning of behaviours for performing multiple tasks. The second model facilitates the recall of learned behaviours by combining motivation with multi-option reinforcement learning. In the third model, motivation is combined with an hierarchical reinforcement learning component to allow both the recall of learned behaviours and the reuse of these behaviours as abstract actions for future learning. Because motivated reinforcement learning agents have capabilities beyond those of existing reinforcement learning approaches, new techniques are required to measure their performance. The secondary aim of this thesis is to develop metrics for measuring the performance of different computational models of motivation with respect to the adaptive, multi-task learning they motivate. This is achieved by analysing the behaviour of motivated reinforcement learning agents incorporating different motivation functions with different learning components. Two new metrics are introduced that evaluate the behaviour learned by motivated reinforcement learning agents in terms of the variety of tasks learned and the complexity of those tasks. Persistent, multi-player computer game worlds are used as the primary example of complex, dynamic environments in this thesis. Motivated reinforcement learning agents are applied to control the non-player characters in games. Simulated game environments are used for evaluating and comparing motivated reinforcement learning agents using different motivation and learning components. The performance and scalability of these agents are analysed in a series of empirical studies in dynamic environments and environments of progressively increasing complexity. Game environments simulating two types of complexity increase are studied: environments with increasing numbers of potential learning tasks and environments with learning tasks that require behavioural cycles comprising more actions. A number of key conclusions can be drawn from the empirical studies, concerning both different computational models of motivation and their combination with different reinforcement learning components. Experimental results confirm that rhythmic behavioural cycles, adaptive behaviour and multi-task learning can be achieved using computational models of motivation as an experience-based reward signal for reinforcement learning. In dynamic environments, motivated reinforcement learning agents incorporating introspective competence motivation adapt more rapidly to change than agents motivated by interest alone. Agents incorporating competence motivation also scale to environments of greater complexity than agents motivated by interest alone. Motivated reinforcement learning agents combining motivation with flat reinforcement learning are the most adaptive in dynamic environments and exhibit scalable behavioural variety and complexity as the number of potential learning tasks is increased. However, when tasks require behavioural cycles comprising more actions, motivated reinforcement learning agents using a multi-option learning component exhibit greater scalability. Motivated multi-option reinforcement learning also provides a more scalable approach to recall than motivated hierarchical reinforcement learning. In summary, this thesis makes contributions in two key areas. Computational models of motivation and motivated reinforcement learning extend reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Motivated reinforcement learning agents allow the design of non-player characters for computer games that can progressively adapt their behaviour in response to changes in their environment.
|
57 |
Hierarchical average reward reinforcement learningSeri, Sandeep 15 March 2002 (has links)
Reinforcement Learning (RL) is the study of agents that learn optimal
behavior by interacting with and receiving rewards and punishments from an unknown
environment. RL agents typically do this by learning value functions that
assign a value to each state (situation) or to each state-action pair. Recently,
there has been a growing interest in using hierarchical methods to cope with the
complexity that arises due to the huge number of states found in most interesting
real-world problems. Hierarchical methods seek to reduce this complexity by the
use of temporal and state abstraction. Like most RL methods, most hierarchical
RL methods optimize the discounted total reward that the agent receives. However,
in many domains, the proper criteria to optimize is the average reward per
time step.
In this thesis, we adapt the concepts of hierarchical and recursive optimality,
which are used to describe the kind of optimality achieved by hierarchical methods,
to the average reward setting and show that they coincide under a condition called
Result Distribution Invariance. We present two new model-based hierarchical RL
methods, HH-learning and HAH-learning, that are intended to optimize the average
reward. HH-learning is a hierarchical extension of the model-based, average-reward RL method, H-learning. Like H-learning, HH-learning requires exploration
in order to learn correct domain models and optimal value function. HH-learning
can be used with any exploration strategy whereas HAH-learning uses the principle
of "optimism under uncertainty", which gives it a built-in "auto-exploratory"
feature. We also give the hierarchical and auto-exploratory hierarchical versions
of R-learning, a model-free average reward method, and a hierarchical version of
ARTDP, a model-based discounted total reward method.
We compare the performance of the "flat" and hierarchical methods in the
task of scheduling an Automated Guided Vehicle (AGV) in a variety of settings.
The results show that hierarchical methods can take advantage of temporal and
state abstraction and converge in fewer steps than the flat methods. The exception
is the hierarchical version of ARTDP. We give an explanation for this anomaly.
Auto-exploratory hierarchical methods are faster than the hierarchical methods
with ��-greedy exploration. Finally, hierarchical model-based methods are faster
than hierarchical model-free methods. / Graduation date: 2003
|
58 |
Learning with Deictic RepresentationFinney, Sarah, Gardiol, Natalia H., Kaelbling, Leslie Pack, Oates, Tim 10 April 2002 (has links)
Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.
|
59 |
A Reinforcement-Learning Approach to Power ManagementSteinbach, Carl 01 May 2002 (has links)
We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
|
60 |
On the Convergence of Stochastic Iterative Dynamic Programming AlgorithmsJaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. 01 August 1993 (has links)
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
|
Page generated in 0.4197 seconds