• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 512
  • 81
  • 65
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 893
  • 893
  • 215
  • 193
  • 149
  • 130
  • 130
  • 128
  • 122
  • 120
  • 119
  • 107
  • 104
  • 96
  • 88
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Design of optimal neural network control strategies with minimal a priori knowledge

Paraskevopoulos, Vasileios January 2000 (has links)
No description available.
42

Learning user modelling strategies for adaptive referring expression generation in spoken dialogue systems

Janarthanam, Srinivasan Chandrasekaran January 2011 (has links)
We address the problem of dynamic user modelling for referring expression generation in spoken dialogue systems, i.e how a spoken dialogue system should choose referring expressions to refer to domain entities to users with different levels of domain expertise, whose domain knowledge is initially unknown to the system. We approach this problem using a statistical planning framework: Reinforcement Learning techniques in Markov Decision Processes (MDP). We present a new reinforcement learning framework to learn user modelling strategies for adaptive referring expression generation (REG) in resource scarce domains (i.e. where no large corpus exists for learning). As a part of the framework, we present novel user simulation models that are sensitive to the referring expressions used by the system and are able to simulate users with different levels of domain knowledge. Such models are shown to simulate real user behaviour more closely than baseline user simulation models. In contrast to previous approaches to user adaptive systems, we do not assume that the user’s domain knowledge is available to the system before the conversation starts. We show that using a small corpus of non-adaptive dialogues it is possible to learn an adaptive user modelling policy in resource scarce domains using our framework. We also show that the learned user modelling strategies performed better in terms of adaptation than hand-coded baselines policies on both simulated and real users. With real users, the learned policy produced around 20% increase in adaptation in comparison to the best performing hand-coded adaptive baseline. We also show that adaptation to user’s domain knowledge results in improving task success (99.47% for learned policy vs 84.7% for hand-coded baseline) and reducing dialogue time of the conversation (11% relative difference). This is because users found it easier to identify domain objects when the system used adaptive referring expressions during the conversations.
43

Neural mechanisms of suboptimal decisions

Chau, Ka Hung Bolton January 2014 (has links)
Making good decisions and adapting flexibly to environmental change are critical to the survival of animals. In this thesis, I investigated neural mechanisms underlying suboptimal decision making in humans and underlying behavioural adaptation in monkeys with the use of functional magnetic resonance imaging (fMRI) in both species. In recent decades, in the neuroscience of decision making, there has been a prominent focus on binary decisions. Whether the presence of an additional third option could have an impact on behaviour and neural signals has been largely overlooked. I designed an experiment in which decisions were made between two options in the presence of a third option. A biophysical model simulation made surprising predictions that more suboptimal decisions were made in the presence of a very poor third alternative. Subsequent human behavioural testing showed consistent results with these predictions. In the ventromedial prefrontal cortex (vmPFC), I found that a value comparison signal that is critical for decision making became weaker in the presence of a poor value third option. The effect contrasts with another prominent potential mechanism during multi-alternative decision making – divisive normalization – the signatures of which were observed in the posterior parietal cortex. It has long been thought that the orbitofrontal cortex (OFC) and amygdala mediate reward-guided behavioural adaptation. However, this viewpoint has been recently challenged. I recorded whole brain activity in macaques using fMRI while they performed an object discrimination reversal task over multiple testing sessions. I identified a lateral OFC (lOFC) region in which activity predicted adaptive win-stay/lose-shift behaviour. In contrast, anterior cingulate cortex (ACC) activity predicted future exploratory decisions regardless of reward outcome. Amygdala and lOFC activity was more strongly coupled for adaptive choice shifting and decoupled for task irrelevant reward memory. Day-to-day fluctuations in signals and signal coupling were correlated with day-to-day fluctuations in performance. These data demonstrate OFC, ACC, and amygdala each make unique contributions to flexible behaviour and credit assignment.
44

A Defender-Aware Attacking Guidance Policy for the TAD Differential Game

English, Jacob T. January 2020 (has links)
No description available.
45

An Automated VNF Manager based on Parameterized-Action MDP and Reinforcement Learning

Li, Xinrui 15 April 2021 (has links)
Managing and orchestrating the behaviour of virtualized Network Functions (VNFs) remains a major challenge due to their heterogeneity and the ever increasing resource demands of the served flows. In this thesis, we propose a novel VNF manager (VNFM) that employs a parameterized actions-based reinforcement learning mechanism to simultaneously decide on the optimal VNF management action (e.g., migration, scaling, termination or rebooting) and the action's corresponding configuration parameters (e.g., migration location or amount of resources needed for scaling ). More precisely, we first propose a novel parameterized-action Markov decision process (PAMDP) model to accurately describe each VNF, instances of its components and their communication as well as the set of permissible management actions by the VNFM and the rewards of realizing these actions. The use of parameterized actions allows us to rigorously represent the functionalities of the VNFM in order perform various Lifecycle management (LCM) operations on the VNFs. Next, we propose a two-stage reinforcement learning (RL) scheme that alternates between learning an action-value function for the discrete LCM actions and updating the actions parameters selection policy. In contrast to existing machine learning schemes, the proposed work uniquely provides a holistic management platform the unifies individual efforts targeting individual LCM functions such as VNF placement and scaling. Performance evaluation results demonstrate the efficiency of the proposed VNFM in maintaining the required performance level of the VNF while optimizing its resource configurations.
46

Inverse Reinforcement Learning and Routing Metric Discovery

Shiraev, Dmitry Eric 01 September 2003 (has links)
Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science
47

Altered Neural and Behavioral Associability-Based Learning in Posttraumatic Stress Disorder

Brown, Vanessa 24 April 2015 (has links)
Posttraumatic stress disorder (PTSD) is accompanied by marked alterations in cognition and behavior, particularly when negative, high-value information is present (Aupperle, Melrose, Stein, & Paulus, 2012; Hayes, Vanelzakker, & Shin, 2012) . However, the underlying processes are unclear; such alterations could result from differences in how this high value information is updated or in its effects on processing future information. To untangle the effects of different aspects of behavior, we used a computational psychiatry approach to disambiguate the roles of increased learning from previously surprising outcomes (i.e. associability; Li, Schiller, Schoenbaum, Phelps, & Daw, 2011) and from large value differences (i.e. prediction error; Montague, 1996; Schultz, Dayan, & Montague, 1997) in PTSD. Combat-deployed military veterans with varying levels of PTSD symptoms completed a learning task while undergoing fMRI; behavioral choices and neural activation were modeled using reinforcement learning. We found that associability-based loss learning at a neural and behavioral level increased with PTSD severity, particularly with hyperarousal symptoms, and that the interaction of PTSD severity and neural markers of associability based learning predicted behavior. In contrast, PTSD severity did not modulate prediction error neural signal or behavioral learning rate. These results suggest that increased associability-based learning underlies neurobehavioral alterations in PTSD. / Master of Science
48

Cocaine Use Modulates Neural Prediction Error During Aversive Learning

Wang, John Mujia 08 June 2015 (has links)
Cocaine use has contributed to 5 million individuals falling into the cycle of addiction. Prior research in cocaine dependence mainly focused on rewards. Losses also play a critical role in cocaine dependence as dependent individuals fail to avoid social, health, and economic losses even when they acknowledge them. However, dependent individuals are extremely adept at escaping negative states like withdrawal. To further understand whether cocaine use may contribute to dysfunctions in aversive learning, this paper uses fMRI and an aversive learning task to examine cocaine dependent individuals abstinent from cocaine use (C-) and using as usual (C+). Specifically of interest is the neural signal representing actual loss compared to the expected loss, better known as prediction error (δ), which individuals use to update future expectations. When abstinent (C-), dependent individuals exhibited higher positive prediction error (δ+) signal in their striatum than when they were using as usual. Furthermore, their striatal δ+ signal enhancements from drug abstinence were predicted by higher positive learning rate (α+) enhancements. However, no relationships were found between drug abstinence enhancements to negative learning rates (α±-) and negative prediction error (δ-) striatal signals. Abstinent (C-) individuals' striatal δ+ signal was predicted by longer drug use history, signifying possible relief learning adaptations with time. Lastly, craving measures, especially the desire to use cocaine and positive effects of cocaine, also positively correlated with C- individuals' striatal δ+ signal. This suggests possible relief learning adaptations in response to higher craving and withdrawal symptoms. Taken together, enhanced striatal δ+ signal when abstinent and adaptations in relief learning provide evidence in supporting dependent individuals' lack of aversive learning ability while using as usual and enhanced relief learning ability for the purpose of avoiding negative situations such as withdrawal, suggesting a neurocomputational mechanism that pushes the dependent individual to maintains dependence. / Master of Science
49

Robot Navigation in Cluttered Environments with Deep Reinforcement Learning

Weideman, Ryan 01 June 2019 (has links)
The application of robotics in cluttered and dynamic environments provides a wealth of challenges. This thesis proposes a deep reinforcement learning based system that determines collision free navigation robot velocities directly from a sequence of depth images and a desired direction of travel. The system is designed such that a real robot could be placed in an unmapped, cluttered environment and be able to navigate in a desired direction with no prior knowledge. Deep Q-learning, coupled with the innovations of double Q-learning and dueling Q-networks, is applied. Two modifications of this architecture are presented to incorporate direction heading information that the reinforcement learning agent can utilize to learn how to navigate to target locations while avoiding obstacles. The performance of the these two extensions of the D3QN architecture are evaluated in simulation in simple and complex environments with a variety of common obstacles. Results show that both modifications enable the agent to successfully navigate to target locations, reaching 88% and 67% of goals in a cluttered environment, respectively.
50

The behavior of institutional investors in IPO markets and the decision of going public abroad

Fu, Youyan January 2016 (has links)
This thesis comprehensively studies three questions. First of all, I use a unique set of institutional investor bids to examine the impact of personal experience on the behavior of institutional investors in an IPO market. I find that, when deciding to participate in future IPOs, institutions take into account initial returns of past IPOs in which they submitted bids more than IPOs which they merely observed. In addition, initial returns from past IPOs in which institutions’ bids were qualified for share allocation were given more consideration than IPOs for which unqualified bids were submitted. This phenomenon is consistent with reinforcement learning. I also find that institutions do not distinguish the returns that are derived from random events. Furthermore, institutions become more aggressive bidders after experiencing high returns in recent IPOs, conditional on personal participation or being qualified for share allocation in those IPOs. This bidding behavior provides additional evidence of reinforcement learning in IPO markets. Secondly, I merge the dataset of institutional investor bids with post-IPO institutional holdings data to examine whether institutional investors such as fund companies reveal their true valuations through bids in a unique quasi-bookbuilding IPO mechanism. I find that fund companies do truthfully disclose their private information via bids, despite these being without guaranteed compensation. My results contribute to the existing literature by providing new evidence on the information compensation theory and have implications for the IPO mechanism design. Finally, I explore the impact on firm valuation of going public abroad using a sample of 136 Chinese firms that conducted IPOs in the US during the period of 1999-2012. I find that US-listed Chinese firms have higher price multiples and experience less underpricing than their domestic-listed peers. The valuation premium stays consistent when a firm’s characteristics and listing cost are being controlled. These findings are consistent with the theories of foreign listing. Moreover, I find that high-tech Chinese firms with a high growth rate but low profitability are more likely to issue shares in the US, particularly for specific industries such as semiconductors, software and online business services. This industry clustering is interpreted as an incentive to access foreign expertise through listing abroad.

Page generated in 0.1219 seconds