Spelling suggestions: "subject:"emporal difference"" "subject:"emporal deifference""
1 |
Behavioural and brain mechanisms of predictive fear learning in the ratCole, Sindy, Psychology, Faculty of Science, UNSW January 2009 (has links)
The experiments reported in this thesis studied the contributions of opioid and NMDA receptors to predictive fear learning, as measured by freezing in the rat. The first series of experiments (Chapter 2) used a within-subject one-trial blocking design to study whether opioid receptors mediate a direct action of predictive error on Pavlovian association formation. Systemic administrations of the opioid receptor antagonist naloxone or intra-vlPAG administrations of the selective μ-opioid receptor antagonist CTAP prior to Stage II training prevented one-trial blocking. These results show for the first time that opioid receptors mediate the direct actions of predictive error on Pavlovian association formation. The second series of experiments (Chapter 3) then studied temporal-difference prediction errors during Pavlovian fear conditioning. In Stage I rats received CSA ?? shock pairings. In Stage II they received CSA/CSB ?? shock pairings that blocked learning to CSB. In Stage III, a serial overlapping compound, CSB → CSA, was followed by shock. The change in intra-trial durations supported fear learning to CSB but reduced fear of CSA, revealing the selective operation of temporal-difference prediction errors. This bi-directional change in responding was prevented by systemic NMDA receptor antagonism prior to Stage III training. In contrast opioid receptor antagonism differentially affected the learning taking place during Stage III, enhancing learning to CSB while impairing the loss of fear to CSA. The final series of experiments (Chapter 4) then examined potential neuroanatomical loci for the systemic effects reported in Chapter 3. It was observed that intra-BLA infusion of ifenprodil, an antagonist of NMDA receptors containing the NR2B subunit, prevented all learning during Stage III, whereas intra-vlPAG infusion of the μ-opioid receptor antagonist CTAP facilitated learning to CSB but impaired learning to CSA. These results are consistent with the suggestion that opioid receptors in the vlPAG provide an important contribution to learning. Importantly, this contribution of the vlPAG is over and above its role in producing the freezing conditioned response. Furthermore, the findings of this thesis identify complementary but dissociable roles for amygdala NMDA receptors and vlPAG μ-opioid receptors in predictive fear learning.
|
2 |
Reinforcement Learning and Simulation-Based Search in Computer GoSilver, David 11 1900 (has links)
Learning and planning are two fundamental problems in artificial intelligence. The learning problem can be tackled by reinforcement learning methods, such as temporal-difference learning, which update a value function from real experience, and use function approximation to generalise across states. The planning problem can be tackled by simulation-based search methods, such as Monte-Carlo tree search, which update a value function from simulated experience, but treat each state individually. We introduce a new method, temporal-difference search, that combines elements of both reinforcement learning and simulation-based search methods. In this new method the value function is updated from simulated experience, but it uses function approximation to efficiently generalise across states. We also introduce the Dyna-2 architecture, which combines temporal-difference learning with temporal-difference search. Whereas temporal-difference learning acquires general domain knowledge from its past experience, temporal-difference search acquires local knowledge that is specialised to the agent's current state, by simulating future experience. Dyna-2 combines both forms of knowledge together.
We apply our algorithms to the game of 9x9 Go. Using temporal-difference learning, with a million binary features matching simple patterns of stones, and using no prior knowledge except the grid structure of the board, we learnt a fast and effective evaluation function. Using temporal-difference search with the same representation produced a dramatic improvement: without any explicit search tree, and with equivalent domain knowledge, it achieved better performance than a vanilla Monte-Carlo tree search. When combined together using the Dyna-2 architecture, our program outperformed all handcrafted, traditional search, and traditional machine learning programs on the 9x9 Computer Go Server.
We also use our framework to extend the Monte-Carlo tree search algorithm. By forming a rapid generalisation over subtrees of the search space, and incorporating heuristic pattern knowledge that was learnt or handcrafted offline, we were able to significantly improve the performance of the Go program MoGo. Using these enhancements, MoGo became the first 9x9 Go program to achieve human master level.
|
3 |
Reinforcement Learning and Simulation-Based Search in Computer GoSilver, David Unknown Date
No description available.
|
4 |
Behavioural and brain mechanisms of predictive fear learning in the ratCole, Sindy, Psychology, Faculty of Science, UNSW January 2009 (has links)
The experiments reported in this thesis studied the contributions of opioid and NMDA receptors to predictive fear learning, as measured by freezing in the rat. The first series of experiments (Chapter 2) used a within-subject one-trial blocking design to study whether opioid receptors mediate a direct action of predictive error on Pavlovian association formation. Systemic administrations of the opioid receptor antagonist naloxone or intra-vlPAG administrations of the selective μ-opioid receptor antagonist CTAP prior to Stage II training prevented one-trial blocking. These results show for the first time that opioid receptors mediate the direct actions of predictive error on Pavlovian association formation. The second series of experiments (Chapter 3) then studied temporal-difference prediction errors during Pavlovian fear conditioning. In Stage I rats received CSA ?? shock pairings. In Stage II they received CSA/CSB ?? shock pairings that blocked learning to CSB. In Stage III, a serial overlapping compound, CSB → CSA, was followed by shock. The change in intra-trial durations supported fear learning to CSB but reduced fear of CSA, revealing the selective operation of temporal-difference prediction errors. This bi-directional change in responding was prevented by systemic NMDA receptor antagonism prior to Stage III training. In contrast opioid receptor antagonism differentially affected the learning taking place during Stage III, enhancing learning to CSB while impairing the loss of fear to CSA. The final series of experiments (Chapter 4) then examined potential neuroanatomical loci for the systemic effects reported in Chapter 3. It was observed that intra-BLA infusion of ifenprodil, an antagonist of NMDA receptors containing the NR2B subunit, prevented all learning during Stage III, whereas intra-vlPAG infusion of the μ-opioid receptor antagonist CTAP facilitated learning to CSB but impaired learning to CSA. These results are consistent with the suggestion that opioid receptors in the vlPAG provide an important contribution to learning. Importantly, this contribution of the vlPAG is over and above its role in producing the freezing conditioned response. Furthermore, the findings of this thesis identify complementary but dissociable roles for amygdala NMDA receptors and vlPAG μ-opioid receptors in predictive fear learning.
|
5 |
The Impact of Threat on Behavioral and Neural Markers of Learning in AnxietyValdespino, Andrew 28 August 2019 (has links)
Anxiety is characterized by apprehensive expectation regarding the forecasted outcomes of choice. Decision science and in particular reinforcement learning models provide a quantitative framework to explain how the likelihood and value of such outcomes are estimated, thus allowing the measurement of parameters of decision-making that may differ between high- and low- anxiety groups. However, the role of anxiety in choice allocation is not sufficiently understood, particularly regarding the influence of transient threat on current decisions. The presence of threat appears to alter choice behavior and may differentially influence quantitatively derived parameters of learning among anxious individuals. Regarding the neurobiology of reinforcement learning, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role in temporally integrating experienced outcomes, as well as in coordinating an overall choice action plan, both of which can be described computationally by learning rate and exploration, respectively. Accordingly, it was hypothesized that high trait anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration of available options, and furthermore that threat would increase the magnitude of these parameters in the high anxiety group. We also hypothesized that the magnitude of neural activation (measured by functional near-infrared spectroscopy; FNIRS) across dissociable regions of the left and right dlPFC would be associated with model parameters, and that threat would further increase the magnitude of activation to model parameters. Finally, it was hypothesized that reward and loss outcomes could be differentiated based on FNIRS channel activation, and that a distinct set of channels would differentiate outcomes in high relative to low anxiety groups. To test these hypotheses, a temporal difference learning model was applied to a decision-making (bandit) task to establish differences in learning parameter magnitudes among individuals high (N=26) and low (N=20) in trait anxiety, as well as the impact of threat on learning parameters.
Results indicated a positive association between anxiety and both the reward and loss learning rate parameters. However, threat was not found to impact model parameters. Imaging results indicated a positive association between exploration and the left dlPFC. Reward and loss outcomes were successfully differentiated in the high, but not low anxiety group.
Results add to a growing literature suggesting anxiety is characterized by differential sensitivity to both losses and rewards in reinforcement learning contexts, and further suggests that the dlPFC plays a role in modulating exploration-based choice strategies. / Doctor of Philosophy / Anxiety is characterized by worry about possible future negative outcomes. Mathematical models in the area of learning theory allow the representation and measurement of individual differences in decision-making tendencies that contribute to negative future apprehension. Currently, the role of anxiety in the allocation of choices, and particularly the influence of threat on decision-making is poorly understood. Threat may influence learning and alter choice behavior, collectively causing negative future apprehension. With regards to how related decision-making is computed in the brain, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role tracking and integrating current and past experienced outcomes, in order to coordinate an overall action plan. Outcome tracking and action plan coordination can be represented mathematically within a learning theory framework by learning rate and exploration parameters, respectively. It was hypothesized that high anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration, and furthermore that threat would increase the magnitude of these tendencies in anxious individuals. We also hypothesized that brain activation in the dlPFC would be associated with these tendencies, and that threat would further increase activation in these brain areas. It was also hypothesized that reward and loss outcomes could be differentiated based on brain activation in the dlPFC. To test these hypotheses, a mathematical model was applied to establish differences in learning within high and low anxiety individuals, as well as to test the impact of threat on these learning tendencies. Results indicated a positive association between anxiety and the rate of learning to reward and loss outcomes. Threat was not found to impact these learning rates. A positive association was found between activation in the dlPFC and the tendency to explore. Reward and loss outcomes were successfully differentiated based on brain activation in high, but not low anxiety individuals. Results add to a growing literature suggesting that anxiety is characterized by differential sensitivity to both losses and rewards, and further adds to our understanding of how the brain computes exploration-based choice strategies.
|
6 |
Gradient Temporal-Difference Learning AlgorithmsMaei, Hamid Reza Unknown Date
No description available.
|
7 |
Defensive avoidance in paranoid delusions : experimental and computational approachesMoutoussis, Michael January 2011 (has links)
This abstract summarises the thesis entitled Defensive Avoidance in Paranoid Delusions: Experimental and Computational Approaches, submitted by Michael Moutoussis to The University of Manchester for the degree of Doctor of Philosophy (PhD) in the faculty of Medical and Human Sciences, in 2011.The possible aetiological role of defensive avoidance in paranoia was investigated in this work. First the psychological significance of the Conditioned Avoidance Response (CAR) was reappraised. The CAR activates normal threat-processing mechanisms that may be pathologically over-activated in the anticipation of threats in paranoia. This may apply both to external threats and also to threats to the self-esteem.A temporal-difference computational model of the CAR suggested that a dopamine-independent process may signal that a particular state has led to a worse-than-expected outcome. On the contrary, learning about actions is likely to involve dopamine in signalling both worse-than-expected and better-than-expected outcomes. The psychological mode of action of dopamine blocking drugs may involve dampening (1) the vigour of the avoidance response and (2) the prediction-error signals that drive action learning.Excessive anticipation of negative events might lead to inappropriately perceived high costs of delaying decisions. Efforts to avoid such costs might explain the Jumping-to-Conclusions (JTC) bias found in paranoid patients. Two decision-theoretical models were used to analyse data from the ‘beads-in-a-jar’ task. One model employed an ideal-observer Bayesian approach; a control model made decisions by weighing evidence against a fixed threshold of certainty. We found no support for our ‘high cost’ hypothesis. According to both models the JTC bias was better explained by higher levels of ‘cognitive noise’ (relative to motivation) in paranoid patients. This ‘noise’ appears to limit the ability of paranoid patients to be influenced by cognitively distant possibilities.It was further hypothesised that excessive avoidance of negative aspects of the self may fuel paranoia. This was investigated empirically. Important self-attributes were elicited in paranoid patients and controls. Conscious and non-conscious avoidance were assessed while negative thoughts about the self were presented. Both ‘deserved’ and ‘undeserved’ persecutory beliefs were associated with high avoidance/control strategies in general, but not with increased of avoidance of negative thoughts about the self. On the basis of the present studies the former is therefore considerably more likely than the latter to play an aetiological role in paranoia.This work has introduced novel computational methods, especially useful in the study of ‘hidden’ psychological variables. It supported and deepened some key hypotheses about paranoia and provided consistent evidence against other important aetiological hypotheses. These contributions have substantial implications for research and for some aspects of clinical practice.
|
8 |
CONTEXT AND SALIENCE: THE ROLE OF DOPAMINE IN REWARD LEARNING AND NEUROPSYCHIATRIC DISORDERSToulouse, Trent M. 04 1900 (has links)
<p>Evidence suggests that a change in the firing rate of dopamine (DA) cells is a major neurobiological correlate of learning. The Temporal Difference (TD) learning algorithm provides a popular account of the DA signal as conveying the error between expected and actual rewards. Other accounts have attempted to code the DA firing pattern as conveying surprise or salience. The DA mediated cells have also been implicated in several neuropsychological disorders such as obsessive compulsive disorder and schizophrenia. Compelling neuropsychological explanations of the DA signal also frame it as conveying salience. A model-based reinforcement learning algorithm using a salience signal analogous to dopamine neurons was built and used to model existing animal behavioral data.</p> <p>Different reinforcement learning models were then compared under conditions of altered DA firing patterns. Several differing predictions of the TD model and the salience model were compared against animal behavioral data in an obsessive compulsive disorder (OCD) model using a dopamine agonist. The results show that the salience model predictions more accurately model actual animal behavior.</p> <p>The role of context in the salience model is different than the standard TD-learning algorithm. Several predictions of the salience model for how people should respond to context shifts of differing salience were tested against known behavioral correlates of endogenous dopamine levels. As predicted, individuals with behavioral traits correlated with higher endogenous dopamine levels are far more sensitive to low salience context shifts than those with correlates to lower endogenous dopamine levels. This is a unique prediction of the salience model for the DA signal which allows for better integration of reinforcement learning models and neuropsychological frameworks for discussing the role of dopamine in learning, memory and behavior.</p> / Doctor of Science (PhD)
|
9 |
Reinforcement learning : theory, methods and application to decision support systemsMouton, Hildegarde Suzanne 12 1900 (has links)
Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: In this dissertation we study the machine learning subfield of Reinforcement Learning (RL).
After developing a coherent background, we apply a Monte Carlo (MC) control algorithm
with exploring starts (MCES), as well as an off-policy Temporal-Difference (TD) learning
control algorithm, Q-learning, to a simplified version of the Weapon Assignment (WA)
problem.
For the MCES control algorithm, a discount parameter of τ
= 1 is used. This gives very
promising results when applied to 7 × 7 grids, as well as 71 × 71 grids. The same discount
parameter cannot be applied to the Q-learning algorithm, as it causes the Q-values to
diverge. We take a greedy approach, setting ε = 0, and vary the learning rate (α ) and the
discount parameter (τ). Experimentation shows that the best results are found with set
to 0.1 and
constrained in the region 0.4 ≤ τ ≤ 0.7.
The MC control algorithm with exploring starts gives promising results when applied to the
WA problem. It performs significantly better than the off-policy TD algorithm, Q-learning,
even though it is almost twice as slow.
The modern battlefield is a fast paced, information rich environment, where discovery of
intent, situation awareness and the rapid evolution of concepts of operation and doctrine
are critical success factors. Combining the techniques investigated and tested in this work
with other techniques in Artificial Intelligence (AI) and modern computational techniques
may hold the key to solving some of the problems we now face in warfare. / AFRIKAANSE OPSOMMING: Die fokus van hierdie verhandeling is die masjienleer-algoritmes in die veld van versterkingsleer.
’n Koherente agtergrond van die veld word gevolg deur die toepassing van ’n
Monte Carlo (MC) beheer-algoritme met ondersoekende begintoestande, sowel as ’n afbeleid
Temporale-Verskil beheer-algoritme, Q-leer, op ’n vereenvoudigde weergawe van die
wapentoekenningsprobleem.
Vir die MC beheer-algoritme word ’n afslagparameter van τ = 1 gebruik. Dit lewer belowende
resultate wanneer toegepas op 7 × 7 roosters, asook op 71 × 71 roosters. Dieselfde
afslagparameter kan nie op die Q-leer algoritme toegepas word nie, aangesien dit veroorsaak
dat die Q-waardes divergeer. Ons neem ’n gulsige aanslag deur die gulsigheidsparameter te
verstel na ε = 0. Ons varieer dan die leertempo ( α) en die afslagparameter (τ). Die beste
eksperimentele resultate is behaal wanneer = 0.1 en as die afslagparameter vasgehou word
in die gebied 0.4 ≤ τ ≤ 0.7.
Die MC beheer-algoritme lewer belowende resultate wanneer toegepas op die wapentoekenningsprobleem.
Dit lewer beduidend beter resultate as die Q-leer algoritme, al neem dit
omtrent twee keer so lank om uit te voer.
Die moderne slagveld is ’n omgewing ryk aan inligting, waar dit kritiek belangrik is om
vinnig die vyand se planne te verstaan, om bedag te wees op die omgewing en die konteks
van gebeure, en waar die snelle ontwikkeling van die konsepte van operasie en doktrine lei tot
sukses. Die tegniekes wat in die verhandeling ondersoek en getoets is, en ander kunsmatige
intelligensie tegnieke en moderne berekeningstegnieke saamgesnoer, mag dalk die sleutel hou
tot die oplossing van die probleme wat ons tans in die gesig staar in oorlogvoering.
|
10 |
Computational modelling of the neural systems involved in schizophreniaThurnham, A. J. January 2008 (has links)
The aim of this thesis is to improve our understanding of the neural systems involved in schizophrenia by suggesting possible avenues for future computational modelling in an attempt to make sense of the vast number of studies relating to the symptoms and cognitive deficits relating to the disorder. This multidisciplinary research has covered three different levels of analysis: abnormalities in the microscopic brain structure, dopamine dysfunction at a neurochemical level, and interactions between cortical and subcortical brain areas, connected by cortico-basal ganglia circuit loops; and has culminated in the production of five models that provide useful clarification in this difficult field. My thesis comprises three major relevant modelling themes. Firstly, in Chapter 3 I looked at an existing neural network model addressing the Neurodevelopmental Hypothesis of Schizophrenia by Hoffman and McGlashan (1997). However, it soon became clear that such models were overly simplistic and brittle when it came to replication. While they focused on hallucinations and connectivity in the frontal lobes they ignored other symptoms and the evidence of reductions in volume of the temporal lobes in schizophrenia. No mention was made of the considerable evidence of dysfunction of the dopamine system and associated areas, such as the basal ganglia. This led to my second line of reasoning: dopamine dysfunction. Initially I helped create a novel model of dopamine neuron firing based on the Computational Substrate for Incentive Salience by McClure, Daw and Montague (2003), incorporating temporal difference (TD) reward prediction errors (Chapter 5). I adapted this model in Chapter 6 to address the ongoing debate as to whether or not dopamine encodes uncertainty in the delay period between presentation of a conditioned stimulus and receipt of a reward, as demonstrated by sustained activation seen in single dopamine neuron recordings (Fiorillo, Tobler & Schultz 2003). An answer to this question could result in a better understanding of the nature of dopamine signaling, with implications for the psychopathology of cognitive disorders, like schizophrenia, for which dopamine is commonly regarded as having a primary role. Computational modelling enabled me to suggest that while sustained activation is common in single trials, there is the possibility that it increases with increasing probability, in which case dopamine may not be encoding uncertainty in this manner. Importantly, these predictions can be tested and verified by experimental data. My third modelling theme arose as a result of the limitations to using TD alone to account for a reinforcement learning account of action control in the brain. In Chapter 8 I introduce a dual weighted artificial neural network, originally designed by Hinton and Plaut (1987) to address the problem of catastrophic forgetting in multilayer artificial neural networks. I suggest an alternative use for a model with fast and slow weights to address the problem of arbitration between two systems of control. This novel approach is capable of combining the benefits of model free and model based learning in one simple model, without need for a homunculus and may have important implications in addressing how both goal directed and stimulus response learning may coexist. Modelling cortical-subcortical loops offers the potential of incorporating both the symptoms and cognitive deficits associated with schizophrenia by taking into account the interactions between midbrain/striatum and cortical areas.
|
Page generated in 0.0795 seconds