• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 638
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1049
  • 1049
  • 254
  • 216
  • 192
  • 177
  • 157
  • 154
  • 152
  • 149
  • 142
  • 127
  • 120
  • 119
  • 112
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

The behavior of institutional investors in IPO markets and the decision of going public abroad

Fu, Youyan January 2016 (has links)
This thesis comprehensively studies three questions. First of all, I use a unique set of institutional investor bids to examine the impact of personal experience on the behavior of institutional investors in an IPO market. I find that, when deciding to participate in future IPOs, institutions take into account initial returns of past IPOs in which they submitted bids more than IPOs which they merely observed. In addition, initial returns from past IPOs in which institutions’ bids were qualified for share allocation were given more consideration than IPOs for which unqualified bids were submitted. This phenomenon is consistent with reinforcement learning. I also find that institutions do not distinguish the returns that are derived from random events. Furthermore, institutions become more aggressive bidders after experiencing high returns in recent IPOs, conditional on personal participation or being qualified for share allocation in those IPOs. This bidding behavior provides additional evidence of reinforcement learning in IPO markets. Secondly, I merge the dataset of institutional investor bids with post-IPO institutional holdings data to examine whether institutional investors such as fund companies reveal their true valuations through bids in a unique quasi-bookbuilding IPO mechanism. I find that fund companies do truthfully disclose their private information via bids, despite these being without guaranteed compensation. My results contribute to the existing literature by providing new evidence on the information compensation theory and have implications for the IPO mechanism design. Finally, I explore the impact on firm valuation of going public abroad using a sample of 136 Chinese firms that conducted IPOs in the US during the period of 1999-2012. I find that US-listed Chinese firms have higher price multiples and experience less underpricing than their domestic-listed peers. The valuation premium stays consistent when a firm’s characteristics and listing cost are being controlled. These findings are consistent with the theories of foreign listing. Moreover, I find that high-tech Chinese firms with a high growth rate but low profitability are more likely to issue shares in the US, particularly for specific industries such as semiconductors, software and online business services. This industry clustering is interpreted as an incentive to access foreign expertise through listing abroad.
52

An architecture for situated learning agents

Mitchell, Matthew Winston, 1968- January 2003 (has links)
Abstract not available
53

Reinforcement learning by incremental patching

Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details.
54

Modelling motivation for experience-based attention focus in reinforcement learning

Merrick, Kathryn January 2007 (has links)
Doctor of Philosophy / Computational models of motivation are software reasoning processes designed to direct, activate or organise the behaviour of artificial agents. Models of motivation inspired by psychological motivation theories permit the design of agents with a key reasoning characteristic of natural systems: experience-based attention focus. The ability to focus attention is critical for agent behaviour in complex or dynamic environments where only small amounts of available information is relevant at a particular time. Furthermore, experience-based attention focus enables adaptive behaviour that focuses on different tasks at different times in response to an agent’s experiences in its environment. This thesis is concerned with the synthesis of motivation and reinforcement learning in artificial agents. This extends reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Reinforcement learning algorithms are computational approaches to learning characterised by the use of reward or punishment to direct learning. The focus of much existing reinforcement learning research has been on the design of the learning component. In contrast, the focus of this thesis is on the design of computational models of motivation as approaches to the reinforcement component that generates reward or punishment. The primary aim of this thesis is to develop computational models of motivation that extend reinforcement learning with three key aspects of attention focus: rhythmic behavioural cycles, adaptive behaviour and multi-task learning in complex, dynamic environments. This is achieved by representing such environments using context-free grammars, modelling maintenance tasks as observations of these environments and modelling achievement tasks as events in these environments. Motivation is modelled by processes for task selection, the computation of experience-based reward signals for different tasks and arbitration between reward signals to produce a motivation signal. Two specific models of motivation based on the experience-oriented psychological concepts of interest and competence are designed within this framework. The first models motivation as a function of environmental experiences while the second models motivation as an introspective process. This thesis synthesises motivation and reinforcement learning as motivated reinforcement learning agents. Three models of motivated reinforcement learning are presented to explore the combination of motivation with three existing reinforcement learning components. The first model combines motivation with flat reinforcement learning for highly adaptive learning of behaviours for performing multiple tasks. The second model facilitates the recall of learned behaviours by combining motivation with multi-option reinforcement learning. In the third model, motivation is combined with an hierarchical reinforcement learning component to allow both the recall of learned behaviours and the reuse of these behaviours as abstract actions for future learning. Because motivated reinforcement learning agents have capabilities beyond those of existing reinforcement learning approaches, new techniques are required to measure their performance. The secondary aim of this thesis is to develop metrics for measuring the performance of different computational models of motivation with respect to the adaptive, multi-task learning they motivate. This is achieved by analysing the behaviour of motivated reinforcement learning agents incorporating different motivation functions with different learning components. Two new metrics are introduced that evaluate the behaviour learned by motivated reinforcement learning agents in terms of the variety of tasks learned and the complexity of those tasks. Persistent, multi-player computer game worlds are used as the primary example of complex, dynamic environments in this thesis. Motivated reinforcement learning agents are applied to control the non-player characters in games. Simulated game environments are used for evaluating and comparing motivated reinforcement learning agents using different motivation and learning components. The performance and scalability of these agents are analysed in a series of empirical studies in dynamic environments and environments of progressively increasing complexity. Game environments simulating two types of complexity increase are studied: environments with increasing numbers of potential learning tasks and environments with learning tasks that require behavioural cycles comprising more actions. A number of key conclusions can be drawn from the empirical studies, concerning both different computational models of motivation and their combination with different reinforcement learning components. Experimental results confirm that rhythmic behavioural cycles, adaptive behaviour and multi-task learning can be achieved using computational models of motivation as an experience-based reward signal for reinforcement learning. In dynamic environments, motivated reinforcement learning agents incorporating introspective competence motivation adapt more rapidly to change than agents motivated by interest alone. Agents incorporating competence motivation also scale to environments of greater complexity than agents motivated by interest alone. Motivated reinforcement learning agents combining motivation with flat reinforcement learning are the most adaptive in dynamic environments and exhibit scalable behavioural variety and complexity as the number of potential learning tasks is increased. However, when tasks require behavioural cycles comprising more actions, motivated reinforcement learning agents using a multi-option learning component exhibit greater scalability. Motivated multi-option reinforcement learning also provides a more scalable approach to recall than motivated hierarchical reinforcement learning. In summary, this thesis makes contributions in two key areas. Computational models of motivation and motivated reinforcement learning extend reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Motivated reinforcement learning agents allow the design of non-player characters for computer games that can progressively adapt their behaviour in response to changes in their environment.
55

Hierarchical average reward reinforcement learning

Seri, Sandeep 15 March 2002 (has links)
Reinforcement Learning (RL) is the study of agents that learn optimal behavior by interacting with and receiving rewards and punishments from an unknown environment. RL agents typically do this by learning value functions that assign a value to each state (situation) or to each state-action pair. Recently, there has been a growing interest in using hierarchical methods to cope with the complexity that arises due to the huge number of states found in most interesting real-world problems. Hierarchical methods seek to reduce this complexity by the use of temporal and state abstraction. Like most RL methods, most hierarchical RL methods optimize the discounted total reward that the agent receives. However, in many domains, the proper criteria to optimize is the average reward per time step. In this thesis, we adapt the concepts of hierarchical and recursive optimality, which are used to describe the kind of optimality achieved by hierarchical methods, to the average reward setting and show that they coincide under a condition called Result Distribution Invariance. We present two new model-based hierarchical RL methods, HH-learning and HAH-learning, that are intended to optimize the average reward. HH-learning is a hierarchical extension of the model-based, average-reward RL method, H-learning. Like H-learning, HH-learning requires exploration in order to learn correct domain models and optimal value function. HH-learning can be used with any exploration strategy whereas HAH-learning uses the principle of "optimism under uncertainty", which gives it a built-in "auto-exploratory" feature. We also give the hierarchical and auto-exploratory hierarchical versions of R-learning, a model-free average reward method, and a hierarchical version of ARTDP, a model-based discounted total reward method. We compare the performance of the "flat" and hierarchical methods in the task of scheduling an Automated Guided Vehicle (AGV) in a variety of settings. The results show that hierarchical methods can take advantage of temporal and state abstraction and converge in fewer steps than the flat methods. The exception is the hierarchical version of ARTDP. We give an explanation for this anomaly. Auto-exploratory hierarchical methods are faster than the hierarchical methods with ��-greedy exploration. Finally, hierarchical model-based methods are faster than hierarchical model-free methods. / Graduation date: 2003
56

Learning with Deictic Representation

Finney, Sarah, Gardiol, Natalia H., Kaelbling, Leslie Pack, Oates, Tim 10 April 2002 (has links)
Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.
57

A Reinforcement-Learning Approach to Power Management

Steinbach, Carl 01 May 2002 (has links)
We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
58

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. 01 August 1993 (has links)
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
59

Closed-Loop Learning of Visual Control Policies

Jodogne, Sébastien 05 December 2006 (has links)
In this dissertation, I introduce a general, flexible framework for learning direct mappings from images to actions in an agent that interacts with its surrounding environment. This work is motivated by the paradigm of purposive vision. The original contributions consist in the design of reinforcement learning algorithms that are applicable to visual spaces. Inspired by the paradigm of local-appearance vision, these algorithms exploit specialized visual features that can be detected in the visual signal. Two different ways to use the visual features are described. Firstly, I introduce adaptive-resolution methods for discretizing the visual space into a manageable number of perceptual classes. To this end, a percept classifier that tests the presence or absence of few highly informative visual features is incrementally refined. New discriminant visual features are selected in a sequence of attempts to remove perceptual aliasing. Any standard reinforcement learning algorithm can then be used to extract an optimal visual control policy. The resulting algorithm is called "Reinforcement Learning of Visual Classes." Secondly, I propose to exploit the raw content of the visual features, without ever considering an equivalence relation on the visual feature space. Technically, feature regression models that associate visual features with a real-valued utility are introduced within the Approximate Policy Iteration architecture. This is done by means of a general, abstract version of Approximate Policy Iteration. This results in the "Visual Approximate Policy Iteration" algorithm. Another major contribution of this dissertation is the design of adaptive-resolution techniques that can be applied to complex, high-dimensional and/or continuous action spaces, simultaneously to visual spaces. The "Reinforcement Learning of Joint Classes" algorithm produces a non-uniform discretization of the joint space of percepts and actions. This is a brand new, general approach to adaptive-resolution methods in reinforcement learning that can deal with arbitrary, hybrid state-action spaces. Throughout this dissertation, emphasis is also put on the design of general algorithms that can be used in non-visual (e.g. continuous) perceptual spaces. The applicability of the proposed algorithms is demonstrated by solving several visual navigation tasks.
60

Constructing Neuro-Fuzzy Control Systems Based on Reinforcement Learning Scheme

Pei, Shan-cheng 10 September 2007 (has links)
Traditionally, the fuzzy rules for a fuzzy controller are provided by experts. They cannot be trained from a set of input-output training examples because the correct response of the plant being controlled is delayed and cannot be obtained immediately. In this paper, we propose a novel approach to construct fuzzy rules for a fuzzy controller based on reinforcement learning. Our task is to learn from the delayed reward to choose sequences of actions that result in the best control. A neural network with delays is used to model the evaluation function Q. Fuzzy rules are constructed and added as the learning proceeds. Both the weights of the Q-learning network and the parameters of the fuzzy rules are tuned by gradient descent. Experimental results have shown that the fuzzy rules obtained perform effectively for control.

Page generated in 0.1206 seconds