• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 640
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1051
  • 1051
  • 254
  • 216
  • 192
  • 177
  • 157
  • 154
  • 152
  • 149
  • 142
  • 127
  • 120
  • 120
  • 112
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Trajectories of Risk Learning and Real-World Risky Behaviors During Adolescence

Wang, John M. 31 August 2020 (has links)
Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves (Casey, Jones, and Hare, 2008), often choosing among options about which they have little knowledge and experience. This process of individuation and independence is reflected as real-world risk taking behaviors (Silveri et al., 2004), including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death (Casey et al., 2008). The extent to which adolescents continue to display increased behaviors with negative consequences during this period of life depends critically on their ability to explore and learn potential consequences of actions within novel environments. This learning is not limited to the value of the outcome associated with making choices, but extends to the levels of risk taken in making those choices. While the existing adolescence literature has focused on neural substrates of risk preferences, how adolescents behaviorally and neurally learn about risks remain unknown. Success or failure to learn the potential variability of these consequences, or the risks involved, in ambiguous decisions is hypothesized to be a crucial process to allow the individuals to make decisions based on their risk preferences. The alternative in which adolescents fail to learn about the risks involved in their decisions leaves the adolescent in a state of continued exploration of the ambiguity, reflected as continued risk-taking behavior. This dissertation comprises 2 papers. The first paper is a perspective paper outlining a paradigm that risk taking behavior observed during adolescents may be a product of each adolescent's abilities to learn about risk. The second paper builds on the hypothesis of the perspective paper by first examining neural correlates of risk learning and quantifying individual risk learning abilities and then examining longitudinal risk learning developmental trajectories in relation to real-world risk-trajectories in adolescent individuals. / Doctor of Philosophy / Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves, often choosing among options about which they have little knowledge and experience. This process of individuation and independence begins with the adolescent exploring their world and those options they are ignorant of. This is reflected as real-world risk-taking behaviors, including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death. We hypothesized and tested the premise that whether adolescents who succeeded or fail to learn about the negative consequences of their actions while exploring will continue to partake in behaviors with negative consequences. This learning is not limited to the value of the outcome associated with making choices, but extends to the range of possible outcomes of the choices or the risks involved. Indeed, the failure to learn the risks involved in decisions with no known information show continued and greater risk-taking behavior, perhaps remaining in a state of continued exploration of the unknown.
112

Reinforcement Learning for Self-adapting Time Discretizations of Complex Systems

Gallagher, Conor Dietrich 27 August 2021 (has links)
The overarching goal of this project is to develop intelligent, self-adapting numerical algorithms for the time discretization of complex real-world problems with Q-Learning methodologies. The specific application is ordinary differential equations which can resolve problems in mathematics, social and natural sciences, but which usually require approximations to solve because direct analytical solutions are rare. Using the traditional Brusellator and Lorenz differential equations as test beds, this research develops models to determine reward functions and dynamically tunes controller parameters that minimize both the error and number of steps required for approximate mathematical solutions. Our best reward function is based on an error that does not overly punish rejected states. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems. Allowing the model to change the alpha/beta value and safety factor by small amounts provides better results than if the model chose values from discrete lists. This method shows potential for training dynamic controllers with Reinforcement Learning. / Master of Science / This research applies Q-Learning, a subset of Reinforcement Learning and Machine Learning, to solve complex mathematical problems that are unable to be solved analytically and therefore require approximate solutions. Specifically, this research applies mathematical modeling of ordinary differential equations which are used in many fields, from theoretical sciences such and physics and chemistry, to applied technical fields such as medicine and engineering, to social and consumer-oriented fields such as finance and consumer purchasing habits, and to the realms of national and international security and communications. Q-Learning develops mathematical models that make decisions, and depending on the outcome, learns if the decision is good or bad, and uses this information to make the next decision. The research develops approaches to determine reward functions and controller parameters that minimize the error and number of steps associated with approximate mathematical solutions to ordinary differential equations. Error is how far the model's answer is from the true answer, and the number of steps is related to how long it takes and how much computational time and cost is associated with the solution. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems and has potential for solving complex mathematical and societal problems.
113

Inverse Reinforcement Learning and Routing Metric Discovery

Shiraev, Dmitry Eric 01 September 2003 (has links)
Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science
114

Altered Neural and Behavioral Associability-Based Learning in Posttraumatic Stress Disorder

Brown, Vanessa 24 April 2015 (has links)
Posttraumatic stress disorder (PTSD) is accompanied by marked alterations in cognition and behavior, particularly when negative, high-value information is present (Aupperle, Melrose, Stein, & Paulus, 2012; Hayes, Vanelzakker, & Shin, 2012) . However, the underlying processes are unclear; such alterations could result from differences in how this high value information is updated or in its effects on processing future information. To untangle the effects of different aspects of behavior, we used a computational psychiatry approach to disambiguate the roles of increased learning from previously surprising outcomes (i.e. associability; Li, Schiller, Schoenbaum, Phelps, & Daw, 2011) and from large value differences (i.e. prediction error; Montague, 1996; Schultz, Dayan, & Montague, 1997) in PTSD. Combat-deployed military veterans with varying levels of PTSD symptoms completed a learning task while undergoing fMRI; behavioral choices and neural activation were modeled using reinforcement learning. We found that associability-based loss learning at a neural and behavioral level increased with PTSD severity, particularly with hyperarousal symptoms, and that the interaction of PTSD severity and neural markers of associability based learning predicted behavior. In contrast, PTSD severity did not modulate prediction error neural signal or behavioral learning rate. These results suggest that increased associability-based learning underlies neurobehavioral alterations in PTSD. / Master of Science
115

Cocaine Use Modulates Neural Prediction Error During Aversive Learning

Wang, John Mujia 08 June 2015 (has links)
Cocaine use has contributed to 5 million individuals falling into the cycle of addiction. Prior research in cocaine dependence mainly focused on rewards. Losses also play a critical role in cocaine dependence as dependent individuals fail to avoid social, health, and economic losses even when they acknowledge them. However, dependent individuals are extremely adept at escaping negative states like withdrawal. To further understand whether cocaine use may contribute to dysfunctions in aversive learning, this paper uses fMRI and an aversive learning task to examine cocaine dependent individuals abstinent from cocaine use (C-) and using as usual (C+). Specifically of interest is the neural signal representing actual loss compared to the expected loss, better known as prediction error (δ), which individuals use to update future expectations. When abstinent (C-), dependent individuals exhibited higher positive prediction error (δ+) signal in their striatum than when they were using as usual. Furthermore, their striatal δ+ signal enhancements from drug abstinence were predicted by higher positive learning rate (α+) enhancements. However, no relationships were found between drug abstinence enhancements to negative learning rates (α±-) and negative prediction error (δ-) striatal signals. Abstinent (C-) individuals' striatal δ+ signal was predicted by longer drug use history, signifying possible relief learning adaptations with time. Lastly, craving measures, especially the desire to use cocaine and positive effects of cocaine, also positively correlated with C- individuals' striatal δ+ signal. This suggests possible relief learning adaptations in response to higher craving and withdrawal symptoms. Taken together, enhanced striatal δ+ signal when abstinent and adaptations in relief learning provide evidence in supporting dependent individuals' lack of aversive learning ability while using as usual and enhanced relief learning ability for the purpose of avoiding negative situations such as withdrawal, suggesting a neurocomputational mechanism that pushes the dependent individual to maintains dependence. / Master of Science
116

Reinforcing Reachable Routes

Thirunavukkarasu, Muthukumar 13 May 2004 (has links)
Reachability routing is a newly emerging paradigm in networking, where the goal is to determine all paths between a sender and a receiver. It is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wireless/ad hoc networks. This thesis presents the case for reinforcement learning (RL) as the framework of choice to realize reachability routing, within the confines of the current Internet backbone infrastructure. The setting of the reinforcement learning problem offers several advantages, including loop resolution, multi-path forwarding capability, cost-sensitive routing, and minimizing state overhead, while maintaining the incremental spirit of the current backbone routing algorithms. We present the design and implementation of a new reachability algorithm that uses a model-based approach to achieve cost-sensitive multi-path forwarding. Performance assessment of the algorithm in various troublesome topologies shows consistently superior performance over classical reinforcement learning algorithms. Evaluations of the algorithm based on different criteria on many types of randomly generated networks as well as realistic topologies are presented. / Master of Science
117

Bayesian methods for knowledge transfer and policy search in reinforcement learning

Wilson, Aaron (Aaron Creighton) 28 July 2012 (has links)
How can an agent generalize its knowledge to new circumstances? To learn effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented knowledge when selecting actions. Our first contribution introduces the multi-task Reinforcement Learning setting in which an agent solves a sequence of tasks. An agent equipped with knowledge of the relationship between tasks can transfer knowledge between them. We propose the transfer of two distinct types of knowledge: knowledge of domain models and knowledge of policies. To represent the transferable knowledge, we propose hierarchical Bayesian priors on domain models and policies respectively. To transfer domain model knowledge, we introduce a new algorithm for model-based Bayesian Reinforcement Learning in the multi-task setting which exploits the learned hierarchical Bayesian model to improve exploration in related tasks. To transfer policy knowledge, we introduce a new policy search algorithm that accepts a policy prior as input and uses the prior to bias policy search. A specific implementation of this algorithm is developed that accepts a hierarchical policy prior. The algorithm learns the hierarchical structure and reuses components of the structure in related tasks. Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective. Successful use of Bayesian Optimization in Reinforcement Learning requires a model relating policies and their performance. Given such a model, Bayesian Optimization can be applied to search for an optimal policy. Early work using Bayesian Optimization in the Reinforcement Learning setting ignored the sequential nature of the underlying decision problem. The work presented in this thesis explicitly addresses this problem. We construct new Bayesian models that take advantage of sequence information to better generalize knowledge across policies. We empirically evaluate the value of this approach in a variety of Reinforcement Learning benchmark problems. Experiments show that our method significantly reduces the amount of exploration required to identify the optimal policy. Our final contribution is a new framework for learning parametric policies from queries presented to an expert. In many domains it is difficult to provide expert demonstrations of desired policies. However, it may still be a simple matter for an expert to identify good and bad performance. To take advantage of this limited expert knowledge, our agent presents experts with pairs of demonstrations and asks which of the demonstrations best represents a latent target behavior. The goal is to use a small number of queries to elicit the latent behavior from the expert. We formulate a Bayesian model of the querying process, an inference procedure that estimates the posterior distribution over the latent policy space, and an active procedure for selecting new queries for presentation to the expert. We show, in multiple domains, that the algorithm successfully learns the target policy and that the active learning strategy generally improves the speed of learning. / Graduation date: 2013
118

Embodied Evolution of Learning Ability

Elfwing, Stefan January 2007 (has links)
Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform. The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies. / QC 20100706
119

Adaptation-based programming

Bauer, Tim (Timothy R.) 31 January 2013 (has links)
Partial programming is a field of study where users specify an outline or skeleton of a program, but leave various parts undefined. The undefined parts are then completed by an external mechanism to form a complete program. Adaptation-Based Programming (ABP) is a method of partial programming that utilizes techniques from the field of reinforcement learning (RL), a subfield of machine learning, to find good completions of those partial programs. An ABP user writes a partial program in some host programming language. At various points where the programmer is uncertain of the best course of action, they include choices that non-deterministically select amongst several options. Additionally, users indicate program success through a reward construct somewhere in their program. The resulting non-deterministic program is completed by treating it as an equivalent RL problem and solving the problem with techniques from that field. Over repeated executions, the RL algorithms within the ABP system will learn to select choices at various points that maximize the reward received. This thesis explores various aspects of ABP such as the semantics of different implementations, including different design trade-offs encountered with each approach. The goal of all approaches is to present a model for programs that adapt to their environment based on the points of uncertainty within the program that the programmer has indicated. The first approach presented in this work is an implementation of ABP as a domain-specific language embedded within a functional language. This language provides constructs for common patterns and situations that arise in adaptive programs. This language proves to be compositional and to foster rapid experimentation with different adaptation methods (e.g. learning algorithms). A second approach presents an implementation of ABP as an object-oriented library that models adaptive programs as formal systems from the field of RL called Markov Decision Processes (MDPs). This approach abstracts away many of the details of the learning algorithm from the casual user and uses a fixed learning algorithm to control the program adaptation rather than allowing it to vary. This abstraction results in an easier-to-use library, but limits the scenarios that ABP can effectively be used in. Moreover, treating adaptive programs as MDPs leads to some unintuitive situations where seemingly reasonably programs fail to adapt efficiently. This work addresses this problem with algorithms that analyze the adaptive program's structure and data flow to boost the rate at which these problematic adaptive programs learn thus increasing the number of problems that ABP can effectively be used to solve. / Graduation date: 2013
120

Utilizing negative policy information to accelerate reinforcement learning

Irani, Arya John 08 June 2015 (has links)
A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.

Page generated in 0.1347 seconds