Spelling suggestions: "subject:"reward aprediction error"" "subject:"reward aprediction arror""
1 |
Feedback Related Negativity: Reward Prediction Error or Salience Prediction Error?Heydari, Sepideh 07 April 2015 (has links)
The reward positivity is a component of the human event-related brain potential (ERP) elicited by feedback stimuli in trial-and-error learning and guessing tasks. A prominent theory holds that the reward positivity reflects a reward prediction error that is differentially sensitive to the valence of the outcomes, namely, larger for unexpected positive events relative to unexpected negative events (Holroyd & Coles, 2002). Although the theory has found substantial empirical support, most of these studies have utilized either monetary or performance feedback to test the hypothesis. However, in apparent contradiction to the theory, a recent study found that unexpected physical punishments (a shock to the finger) also elicit the reward positivity (Talmi, Atkinson, & El-Deredy, 2013). Accordingly, these investigators argued that this ERP component reflects a salience prediction error rather than a reward prediction error. To investigate this finding further, I adapted the task paradigm by Talmi and colleagues to a more standard guessing task often used to investigate the reward positivity. Participants navigated a virtual T-maze and received feedback on each trial under two conditions. In a reward condition the feedback indicated that they would either receive a monetary reward or not for their performance on that trial. In a punishment condition the feedback indicated that they would receive a small shock or not at the end of the trial. I found that the feedback stimuli elicited a typical reward positivity in the reward condition and an apparently delayed reward positivity in the punishment condition. Importantly, this signal was more positive to the stimuli that predicted the omission of a possible punishment relative to stimuli that predicted a forthcoming punishment, which is inconsistent with the salience hypothesis. / Graduate / 0633 / 0317 / heydari@uvic.ca
|
2 |
Computational modelling of the neural systems involved in schizophreniaThurnham, A. J. January 2008 (has links)
The aim of this thesis is to improve our understanding of the neural systems involved in schizophrenia by suggesting possible avenues for future computational modelling in an attempt to make sense of the vast number of studies relating to the symptoms and cognitive deficits relating to the disorder. This multidisciplinary research has covered three different levels of analysis: abnormalities in the microscopic brain structure, dopamine dysfunction at a neurochemical level, and interactions between cortical and subcortical brain areas, connected by cortico-basal ganglia circuit loops; and has culminated in the production of five models that provide useful clarification in this difficult field. My thesis comprises three major relevant modelling themes. Firstly, in Chapter 3 I looked at an existing neural network model addressing the Neurodevelopmental Hypothesis of Schizophrenia by Hoffman and McGlashan (1997). However, it soon became clear that such models were overly simplistic and brittle when it came to replication. While they focused on hallucinations and connectivity in the frontal lobes they ignored other symptoms and the evidence of reductions in volume of the temporal lobes in schizophrenia. No mention was made of the considerable evidence of dysfunction of the dopamine system and associated areas, such as the basal ganglia. This led to my second line of reasoning: dopamine dysfunction. Initially I helped create a novel model of dopamine neuron firing based on the Computational Substrate for Incentive Salience by McClure, Daw and Montague (2003), incorporating temporal difference (TD) reward prediction errors (Chapter 5). I adapted this model in Chapter 6 to address the ongoing debate as to whether or not dopamine encodes uncertainty in the delay period between presentation of a conditioned stimulus and receipt of a reward, as demonstrated by sustained activation seen in single dopamine neuron recordings (Fiorillo, Tobler & Schultz 2003). An answer to this question could result in a better understanding of the nature of dopamine signaling, with implications for the psychopathology of cognitive disorders, like schizophrenia, for which dopamine is commonly regarded as having a primary role. Computational modelling enabled me to suggest that while sustained activation is common in single trials, there is the possibility that it increases with increasing probability, in which case dopamine may not be encoding uncertainty in this manner. Importantly, these predictions can be tested and verified by experimental data. My third modelling theme arose as a result of the limitations to using TD alone to account for a reinforcement learning account of action control in the brain. In Chapter 8 I introduce a dual weighted artificial neural network, originally designed by Hinton and Plaut (1987) to address the problem of catastrophic forgetting in multilayer artificial neural networks. I suggest an alternative use for a model with fast and slow weights to address the problem of arbitration between two systems of control. This novel approach is capable of combining the benefits of model free and model based learning in one simple model, without need for a homunculus and may have important implications in addressing how both goal directed and stimulus response learning may coexist. Modelling cortical-subcortical loops offers the potential of incorporating both the symptoms and cognitive deficits associated with schizophrenia by taking into account the interactions between midbrain/striatum and cortical areas.
|
3 |
NEUROBEHAVIORAL MEASUREMENTS OF NATURAL AND OPIOID REWARD VALUESmith, Aaron Paul 01 January 2019 (has links)
In the last decade, (non)prescription opioid abuse, opioid use disorder (OUD) diagnoses, and opioid-related overdoses have risen and represent a significant public health concern. One method of understanding OUD is as a disorder of choice that requires choosing opioid rewards at the expense of other nondrug rewards. The characterization of OUD as a disorder of choice is important as it implicates decision- making processes as therapeutic targets, such as the valuation of opioid rewards. However, reward-value measurement and interpretation are traditionally different in substance abuse research compared to related fields such as economics, animal behavior, and neuroeconomics and may be less effective for understanding how opioid rewards are valued. The present research therefore used choice procedures in line with behavioral/neuroeconomic studies to determine if drug-associated decision making could be predicted from economic choice theories. In Experiment 1, rats completed an isomorphic food-food probabilistic choice task with dynamic, unpredictable changes in reward probability that required constant updating of reward values. After initial training, the reward magnitude of one choice subsequently increased from one to two to three pellets. Additionally, rats were split between the Signaled and Unsignaled groups to understand how cues modulate reward value. After each choice, the Unsignaled group received distinct choice-dependent cues that were uninformative of the choice outcome. The Signaled group also received uninformative cues on one option, but the alternative choice produced reward-predictive cues that informed the trial outcome as a win or loss. Choice data were analyzed at a molar level using matching equations and molecular level using reinforcement learning (RL) models to determine how probability, reward magnitude, and reward-associated cues affected choice. Experiment 2 used an allomorphic drug versus food procedure where the food reward for one option was replaced by a self-administered remifentanil (REMI) infusion at doses of 1, 3 and 10 μg/kg. Finally, Experiment 3 assessed the potential for both REMI and food reward value to be commonly scaled within the brain by examining changes in nucleus accumbens (NAc) Oxygen (O2) dynamics. Results showed that increasing reward probability, magnitude, and the presence of reward-associated cues all independently increased the propensity of choosing the associated choice alternative, including REMI drug choices. Additionally, both molar matching and molecular RL models successfully parameterized rats’ decision dynamics. O2 dynamics were generally commensurate with the idea of a common value signal for REMI and food with changes in O2 signaling scaling with the reward magnitude of REMI rewards. Finally, RL model-derived reward prediction errors significantly correlated with peak O2 activity for reward delivery, suggesting a possible neurological mechanism of value updating. Results are discussed in terms of their implications for current conceptualizations of substance use disorders including a potential need to change the discourse surrounding how substance use disorders are modeled experimentally. Overall, the present research provides evidence that a choice model of substance use disorders may be a viable alternative to the disease model and could facilitate future treatment options centered around economic principles.
|
4 |
An electrophysiological investigation of reward prediction errors in the human brainSambrook, Thomas January 2015 (has links)
Reward prediction errors are quantitative signed terms that express the difference between the value of an obtained outcome and the expected value that was placed on it prior to its receipt. Positive reward prediction errors constitute reward, negative reward prediction errors constitute punishment. Reward prediction errors have been shown to be powerful drivers of reinforcement learning in formal models and there is thus a strong reason to believe they are used in the brain. Isolating such neural signals stands to help elucidate how reinforcement learning is implemented in the brain, and may ultimately shed light on individual differences, psychopathologies of reward such as addiction and depression, and the apparently non-normative behaviour under risk described by behavioural economics. In the present thesis, I used the event related potential technique to isolate and study electrophysiological components whose behaviour resembled reward prediction errors. I demonstrated that a candidate component, “feedback related negativity”, occurring 250 to 350 ms after receipt of reward or punishment, showed such behaviour. A meta-analysis of the existing literature on this component, using a novel technique of “great grand averaging”, supported this view. The component showed marked asymmetries however, being more responsive to reward than punishment and more responsive to appetitive rather than aversive outcomes. I also used novel data-driven techniques to examine activity outside the temporal interval associated with the feedback related negativity. This revealed a later component responding solely to punishments incurred in a Pavlovian learning task. It also revealed numerous salience-encoding components which were sensitive to a prediction error’s size but not its sign.
|
Page generated in 0.0984 seconds