Spelling suggestions: "subject:"reward 1earning"" "subject:"reward c1earning""
1 |
Behavioral Training of Reward Learning Increases Reinforcement Learning Parameters and Decreases Depression Symptoms Across Repeated SessionsGoyal, Shivani 12 1900 (has links)
Background: Disrupted reward learning has been suggested to contribute to the etiology and maintenance of depression. If deficits in reward learning are core to depression, we would expect that improving reward learning would decrease depression symptoms across time. Whereas previous studies have shown that changing reward learning can be done in a single study session, effecting clinically meaningful change in learning requires this change to endure beyond task completion and transfer to real world environments. With a longitudinal design, we investigate the potential for repeated sessions of behavioral training to create change in reward learning and decrease depression symptoms across time.
Methods: 929 online participants (497 depression-present; 432 depression-absent) recruited from Amazon’s Mechanical Turk platform completed a behavioral training paradigm and clinical selfreport measures for up to eight total study visits. Participants were randomly assigned to one of 12 arms of the behavioral training paradigm, in which they completed a probabilistic reward learning task interspersed with queries about a feature of the task environment (11 learning arms) or a control query (1 control arm). Learning queries trained participants on one of four computational-based learning targets known to affect reinforcement learning (probability, average or extreme outcome values, and value comparison processes). A reinforcement learning model previously shown to distinguish depression related differences in learning was fit to behavioral responses using hierarchical Bayesian estimation to provide estimates of reward sensitivity and learning rate for each participant on each visit. Reward sensitivity captured participants’ value dissociation between high versus low outcome values, while learning rate informed how much participants learned from previously experienced outcomes. Mixed linear models assessed relationships between model-agnostic task performance, computational model-derived reinforcement learning parameters, depression symptoms, and study progression.
Results: Across time, learning queries increased individuals’ reward sensitivities in depression-absent participants (β = 0.036, p =< 0.001, 95% CI (0.022, 0.049)). In contrast, control queries did not change reward sensitivities in depression-absent participants across time ((β = 0.016, p = 0.303, 95% CI (-0.015, 0.048)). Learning rates were not affected across time for participants receiving learning queries (β = 0.001, p = 0.418, 95% CI (-0.002, 0.004)) or control queries (β = 0.002, p = 0.558, 95% CI (-0.005, 0.009). Of the learning queries, those targeting value comparison processes improved depression symptoms (β = -0.509, p = 0.015, 95% CI (-0.912, - 0.106)) and increased reward sensitivities across time (β = 0.052, p =< 0.001, 95% CI (0.030, 0.075)) in depression-present participants. Increased reward sensitivities related to decreased depression symptoms across time in these participants (β = -2.905, p = 0.002, 95% CI (-4.75, - 1.114)).
Conclusions: Multiple sessions of targeted behavioral training improved reward learning for participants with a range of depression symptoms. Improved behavioral reward learning was associated with improved clinical symptoms with time, possibly because learning transferred to real world scenarios. These results support disrupted reward learning as a mechanism contributing to the etiology and maintenance of depression and suggest the potential of repeated behavioral training to target deficits in reward learning. / Master of Science / Disrupted reward learning has been suggested to be central to depression. Work investigating how changing reward learning affects clinical symptoms has the potential to clarify the role of reward learning in depression. Here, we address this question by investigating if multiple sessions of behavioral training changes reward learning and decreases depression symptoms across time. We recruited 929 online participants to complete up to eight study visits. On each study visit participants completed a depression questionnaire and one of 12 arms of a behavioral training paradigm, in which they completed a reward learning task interspersed with queries about the task. Queries trained participants on one of four learning targets known to affect reward learning (probability, average or extreme outcome values, and value comparison processes). We used reinforcement learning to quantify specific reward learning processes, including how much participants valued high vs. low outcomes (reward sensitivity) and how much participants learned from previously experienced outcomes (learning rates). Across study visits, we found that participants without depression symptoms that completed the targeted behavioral training increased reward sensitivities (β = 0.036, p =< 0.001, 95% CI (0.022, 0.049)). Of the queries, those targeting value comparison processes improved both depression symptoms (β = -0.509, p = 0.015, 95% CI (-0.912, -0.106)) and reward sensitivities (β = 0.052, p =< 0.001, 95% CI (0.030, 0.075)) across study visits for participants with depression symptoms. These results suggest that multiple sessions of behavioral training can increase reward learning across time for participants with and without depression symptoms. Further, these results support the role of disrupted reward learning in depression and suggest the potential for behavioral training to improve both reward learning and symptoms in depression.
|
2 |
The Effects of Self-Relevance on Neural Learning Signals Indexing Attention, Perception, and LearningRocha Hammerstrom, Mathew 28 September 2022 (has links)
Humans tend to preferentially process information relevant to themselves. For instance, in experiments where participants learn to manipulate stimuli referenced to themselves or someone else, participants exhibit larger reward processing signals for themselves. Additionally, attention and perception are biased not only towards one’s self but those related to them. However, the aspect of processing information related to known-others has not been addressed in reward learning. Here, I sought to address this issue. Specifically, I recorded electroencephalographic (EEG) data from 15 undergraduate student participants who played a simple two-choice “bandit” gambling game where a photo presented before each gamble indicated whether it benefited either the participant, an individual they knew, or a stranger. EEG data from 64 electrodes on a standard 10-20 layout were analyzed for event-related potentials (ERPs) elicited by target photos and gambling outcomes. Post experiment, I examined the relationship between relatedness and the amplitude reward learning ERPs, namely the reward positivity and the P300, with one-way repeated measures analyses of variance. My results demonstrate that the amplitudes of reward learning ERPs are sensitive to the target of a gamble. A secondary goal of this research was to determine if these differences could be explained by attentional and perceptual responses to cues of who a given gamble was for. Indeed, stepwise linear regression analyses identified the P2, N2, and P3 indexed relevance to self as predictors of resultant reward signals. My findings provide further evidence that a reward learning system within the medial-frontal cortex is sensitive to others with varying self-relevance, which may be a function of biases in attention and perception. / Graduate
|
3 |
Ethanol experience induces metaplasticity of NMDA receptor-mediated transmission in ventral tegmental area dopamine neuronsBernier, Brian Ernest 31 October 2011 (has links)
Addiction is thought to arise, in part, from a maladaptive learning process in which enduring memories of drug-related experiences are formed, resulting in persistent and uncontrollable drug-seeking behavior. However, it is well known that both acute and chronic alcohol (ethanol) exposures impair various types of learning and memory in both humans and animals. Consistent with these observations, both acute and chronic exposures to ethanol suppress synaptic plasticity, the major neural substrate for learning and memory, in multiple brain areas. Therefore, it remains unclear how powerful memories associated with alcohol experience are formed during the development of alcoholism.
The mesolimbic dopaminergic system is critically involved in the learning of
information related to rewards, including drugs of abuse. Both natural and drug rewards, such as ethanol, cause release of dopamine in the nucleus accumbens and other limbic structures, which is thought to drive learning by enhancing synaptic plasticity. Accumulating evidence indicates that plasticity of glutamatergic transmission onto dopamine neurons may play an important role in the development of addiction. Plasticity of NMDA receptor (NMDAR)-mediated transmission may be of particular interest, as NMDAR activation is necessary for dopamine neuron burst firing and phasic dopamine release in projection areas that occurs in response to rewards or reward-predicting stimuli. NMDAR plasticity may, therefore, drive the learning of stimuli associated with rewards, including drugs of abuse.
This dissertation finds that repeated in vivo ethanol exposure induces a
metaplasticity of NMDAR-mediated transmission in mesolimbic dopamine neurons, expressed as an increased susceptibility to the induction of NMDAR LTP. Enhancement of NMDAR plasticity results from an increase in the potency of inositol 1,4,5- trisphosphate (IP3) in producing the facilitation of action potential-evoked Ca2+ signals critical for LTP induction. Interestingly, amphetamine exposure produces a similar enhancement of IP3R function, suggesting this neuroadaptation may be a common response to exposure to multiple drugs of abuse. Additionally, ethanol-treated mice display enhanced learning of cues associated with cocaine exposure. These findings suggest that metaplasticity of NMDAR LTP may contribute to the formation of powerful memories related to drug experiences and provide an important insight into the learning
component of addiction. / text
|
4 |
Positive Affect Promotes Unbiased and Flexible Attention: Towards a Dopaminergic Model of PositivityYou, Yuqi 05 January 2012 (has links)
A review of extant literature on positive affect suggested that it has two major dimensions: a hedonic dimension related to subjective feelings and reward processing, and a cognitive dimension related to affect-specific changes in perception and cognition. A novel dopaminergic mod el was proposed to provide a unitary account for the effects of positive affect across the two dimensions. The model hypothesized that positive affect is associated with distinct modes of mesocortical and mesolimbic dopa mine transmission, which in turn mediate unbiased, unfiltered and flexible attention. Three separate behavioral tasks on perception, attention, and reward learning were conducted. In line with the hypothesis, positive affect was found to associate with less biased bi-stable perception, faster regain of attention to previously ignored information, and fewer perseverative errors in face of changing reward contingencies.
|
5 |
Positive Affect Promotes Unbiased and Flexible Attention: Towards a Dopaminergic Model of PositivityYou, Yuqi 05 January 2012 (has links)
A review of extant literature on positive affect suggested that it has two major dimensions: a hedonic dimension related to subjective feelings and reward processing, and a cognitive dimension related to affect-specific changes in perception and cognition. A novel dopaminergic mod el was proposed to provide a unitary account for the effects of positive affect across the two dimensions. The model hypothesized that positive affect is associated with distinct modes of mesocortical and mesolimbic dopa mine transmission, which in turn mediate unbiased, unfiltered and flexible attention. Three separate behavioral tasks on perception, attention, and reward learning were conducted. In line with the hypothesis, positive affect was found to associate with less biased bi-stable perception, faster regain of attention to previously ignored information, and fewer perseverative errors in face of changing reward contingencies.
|
6 |
Inferring the Human's Objective in Human Robot InteractionHoegerman, Joshua Thomas 03 May 2024 (has links)
This thesis discusses the use of Bayesian Inference in inferring over the human's objective for Human-Robot Interaction, more specifically, it focuses upon the adaptation of methods to better utilize the information for inferring upon the human's objective for Reward Learning and Communicative Shared Autonomy settings. To accomplish this, we first examine state-of-the-art methods for approaching Bayesian Inverse Reinforcement learning where we explore the strengths and weaknesses of current approaches. After which we explore alternative methods for approaching the problem, borrowing similar approaches to those of the statistics community to apply alternative methods to improve the sampling process over the human's belief. After this, I then move to a discussion on the setting of Shared Autonomy in the presence and absence of communication. These differences are then explored in our method for inferring upon an environment where the human is aware of the robot's intention and how this can be used to dramatically improve the robot's ability to cooperate and infer upon the human's objective. In total, I conclude that the use of these methods to better infer upon the human's objective significantly improves the performance and cohesion between the human and robot agents within these settings. / Master of Science / This thesis discusses the use of various methods to allow robots to better understand human actions so that they can learn and work with those humans. In this work we focus upon two areas of inferring the human's objective: The first is where we work with learning what things the human prioritizes when completing certain tasks to better utilize the information inherent in the environment to best learn those priorities such that a robot can replicate the given task. The second body of work surrounds Shared Autonomy where we work to have the robot better infer what task a human is going to do and thus better allow the robot to assist with this goal through using communicative interfaces to alter the information dynamic the robot uses to infer upon that human intent. Collectively, the work of the thesis works to push that the current inference methods for Human-Robot Interaction can be improved through the further progression of inference to best approximate the human's internal model in a given setting.
|
7 |
Perturbing Neural Feedback Loops to Understand the Relationships of Their PartsJanuary 2014 (has links)
abstract: The basal ganglia are four sub-cortical nuclei associated with motor control and reward learning. They are part of numerous larger mostly segregated loops where the basal ganglia receive inputs from specific regions of cortex. Converging on these inputs are dopaminergic neurons that alter their firing based on received and/or predicted rewarding outcomes of a behavior. The basal ganglia's output feeds through the thalamus back to the areas of the cortex where the loop originated. Understanding the dynamic interactions between the various parts of these loops is critical to understanding the basal ganglia's role in motor control and reward based learning. This work developed several experimental techniques that can be applied to further study basal ganglia function. The first technique used micro-volume injections of low concentration muscimol to decrease the firing rates of recorded neurons in a limited area of cortex in rats. Afterwards, an artificial cerebrospinal fluid flush was injected to rapidly eliminate the muscimol's effects. This technique was able to contain the effects of muscimol to approximately a 1 mm radius volume and limited the duration of the drug effect to less than one hour. This technique could be used to temporarily perturb a small portion of the loops involving the basal ganglia and then observe how these effects propagate in other connected regions. The second part applied self-organizing maps (SOM) to find temporal patterns in neural firing rate that are independent of behavior. The distribution of detected patterns frequency on these maps can then be used to determine if changes in neural activity are occurring over time. The final technique focused on the role of the basal ganglia in reward learning. A new conditioning technique was created to increase the occurrence of selected patterns of neural activity without utilizing any external reward or behavior. A pattern of neural activity in the cortex of rats was selected using an SOM. The pattern was then reinforced by being paired with electrical stimulation of the medial forebrain bundle triggering dopamine release in the basal ganglia. Ultimately, this technique proved unsuccessful possibly due to poor selection of the patterns being reinforced. / Dissertation/Thesis / Ph.D. Bioengineering 2014
|
8 |
CONTEXT AND SALIENCE: THE ROLE OF DOPAMINE IN REWARD LEARNING AND NEUROPSYCHIATRIC DISORDERSToulouse, Trent M. 04 1900 (has links)
<p>Evidence suggests that a change in the firing rate of dopamine (DA) cells is a major neurobiological correlate of learning. The Temporal Difference (TD) learning algorithm provides a popular account of the DA signal as conveying the error between expected and actual rewards. Other accounts have attempted to code the DA firing pattern as conveying surprise or salience. The DA mediated cells have also been implicated in several neuropsychological disorders such as obsessive compulsive disorder and schizophrenia. Compelling neuropsychological explanations of the DA signal also frame it as conveying salience. A model-based reinforcement learning algorithm using a salience signal analogous to dopamine neurons was built and used to model existing animal behavioral data.</p> <p>Different reinforcement learning models were then compared under conditions of altered DA firing patterns. Several differing predictions of the TD model and the salience model were compared against animal behavioral data in an obsessive compulsive disorder (OCD) model using a dopamine agonist. The results show that the salience model predictions more accurately model actual animal behavior.</p> <p>The role of context in the salience model is different than the standard TD-learning algorithm. Several predictions of the salience model for how people should respond to context shifts of differing salience were tested against known behavioral correlates of endogenous dopamine levels. As predicted, individuals with behavioral traits correlated with higher endogenous dopamine levels are far more sensitive to low salience context shifts than those with correlates to lower endogenous dopamine levels. This is a unique prediction of the salience model for the DA signal which allows for better integration of reinforcement learning models and neuropsychological frameworks for discussing the role of dopamine in learning, memory and behavior.</p> / Doctor of Science (PhD)
|
9 |
Reinforcement learning and reward estimation for dialogue policy optimisationSu, Pei-Hao January 2018 (has links)
Modelling dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for goal-oriented applications, which usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue systems, the reward is hard to measure, because the goal of the conversation is often known only to the user. Certainly, the system can ask the user if the goal has been satisfied, but this can be intrusive. Furthermore, in practice, the reliability of the user’s response has been found to be highly variable. In addition, due to the sparsity of the reward signal and the large search space, reinforcement learning-based dialogue policy optimisation is often slow. This thesis presents several approaches to address these problems. To better evaluate a dialogue for policy optimisation, two methods are proposed. First, a recurrent neural network-based predictor pre-trained from off-line data is proposed to estimate task success during subsequent on-line dialogue policy learning to avoid noisy user ratings and problems related to not knowing the user’s goal. Second, an on-line learning framework is described where a dialogue policy is jointly trained alongside a reward function modelled as a Gaussian process with active learning. This mitigates the noisiness of user ratings and minimises user intrusion. It is shown that both off-line and on-line methods achieve practical policy learning in real-world applications, while the latter provides a more general joint learning system directly from users. To enhance the policy learning speed, the use of reward shaping is explored and shown to be effective and complementary to the core policy learning algorithm. Furthermore, as deep reinforcement learning methods have the potential to scale to very large tasks, this thesis also investigates the application to dialogue systems. Two sample-efficient algorithms, trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER), are introduced. In addition, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning to handle the cold start problem. Combining these two methods, a practical approach is demonstrated to effectively learn deep reinforcement learning-based dialogue policies in a task-oriented information seeking domain. Overall, this thesis provides solutions which allow truly on-line and continuous policy learning in spoken dialogue systems.
|
Page generated in 0.0831 seconds